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Executive  Summary 


The  overall  objective  of  this  project  was  to  investigate  novel  hybrid  techniques  that  combine 
concepts  from  quantum  and  classical  computer  science  to  solve  hard  computational  problems, 
including  the  handling  of  uncertainty.  The  research  problems  considered  include  design 
optimization  and  simulation  of  conventional  CMOS  and  quantum  systems,  fault  tolerance, 
resource  allocation  and  scheduling,  strategy  optimization  and  related  challenges  facing  the  Air 
Force.  The  research  focuses  on  accurate  modeling  of  practical  metrics  of  performance,  robustness, 
and  cost,  and  their  optimization  in  both  linear  and  non-linear  domains,  using  fast  exact  and 
heuristic  methods,  along  with  highly  efficient  data  representations.  Errors  in  data  and  control  due 
to  environmental  effects,  as  well  as  uncertainty  in  the  problem  formulation,  are  taken  into  account 
during  system  modeling  and  optimization. 

The  project’s  main  accomplishments  include  the  following: 

•  An  analytical  study  of  probabilistic  fault  models  in  digital  logic  and  their  impact  on  overall 
circuit  performance.  Probabilistic  faults  affect  electronics  in  high-altitude  and 
high-radiation  environments,  especially  state-of-the-art  deep-submicron  CMOS  chips. 

•  New  algorithmic  methodologies  and  tools  for  circuit  synthesis  and  test  to  mitigate  the 
impact  of  probabilistic  faults.  These  methodologies  include  fast  evaluation  of  circuit 
reliability  based  on  functional  simulation,  and  incremental  modification  of  a  circuit  to 
improve  its  robustness. 

•  Analysis  of  mobile  (wireless)  ad  hoc  communication  networks,  focused  on  network 
throughput  and  total  power  consumption. 

•  A  non-linear  programming  framework  for  spatial  optimization  of  mobile  networks,  its 
empirical  evaluation,  and  visualization  of  results. 

•  A  new  algorithm  to  simulate  quantum  circuits  which  exhibits  polynomial-time 
performance  in  several  important  cases.  This  algorithm  and  accompanying  theoretical 
results  have  been  subsequently  used  by  other  researchers  to  show  that  the  Quantum  Fourier 
Transform  (QFT)  can  be  simulated  in  polynomial  time  on  conventional  computers. 

•  Several  techniques  for  verification  of  correctness  of  quantum  circuits.  These  techniques 
are  based  on  computational  engines  frequently  used  in  Electronic  Design  Automation 
Boolean  satisfiability  (SAT)  and  binary  decision  diagram  (BDD),  fall  under  the  category  of 
equivalence-checking,  and  can  verify  the  results  of  adapting  known  circuits  to  specific 
device  architectures,  such  as  linear  ion  traps. 
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2. 


Introduction 


This  project  encompasses  several  topics  of  interest  to  the  Air  Force — from  near-term  to 
long-term — that  concern  hybrid  classical-quantum  systems  affected  by  uncertainty.  The  work 
reported  includes  algorithmic  techniques  and  methodologies  to  simulate,  compare  and  evaluate 
hybrid  systems,  as  well  as  to  optimize  them  for  robustness,  performance  and  resource  utilization. 


2.1.  Topics  Addressed 

One  of  the  near-term  efforts  focused  on  probabilistic  (non-deterministic)  faults  and  errors  in 
electronic  devices,  particularly  in  semiconductor  chips  that  are  manufactured  with  increasing 
device  densities  and  miniature  devices  susceptible  to  transient  particle  strikes.  Our  results  include 
an  extensive  suite  of  algorithmic  techniques  to  represent  transient  faults  and  quickly  evaluate  their 
impact  on  a  large  circuit.  We  also  developed  a  methodology  for  hardening  a  given  circuit  by 
inserting  a  small  amount  of  redundancy  in  carefully  chosen  sections  of  the  circuit.  This 
methodology  improves  robustness  of  the  circuit,  while  decreasing  area  and  power  overhead.  We 
also  carried  out  one  of  the  first  studies  of  circuit  testing  for  non-deterministic  faults.  Observing 
that  existing  test  patterns  may  have  to  be  repeated  in  order  to  observe  transient  faults,  we 
developed  algorithms  for  calculating  replication  factors  for  given  tests. 

A  second  near-term  effort  focused  on  mobile  (wireless)  ad  hoc  networks  (MANETs),  whose 
structure  can  be  determined  by  the  locations  and  power  levels  of  nodes  and  relays.  We  studied  the 
impact  of  these  locations  on  network  throughput  and  total  power  consumption,  as  well  as  run-time 
reconfiguration  of  MANETs.  To  this  end,  we  developed  a  non-linear  programming  framework 
which  determines  locations  so  as  to  achieve  the  best  compromise  between  resource  consumption 
and  network  throughput.  The  technique  was  evaluated  on  a  number  of  realistic  test-cases  under 
dynamic  scenarios  where  network  parameters  change.  The  incremental  changes  in  network 
structure  can  be  visualized  and  communicated  to  a  human  operator  in  real  time. 

Longer-term  topics  studied  in  our  work  include  algorithms  for  simulation  and  comparison  of 
quantum  circuits.  This  research  pursued  several  complementary  goals,  such  as  exploring  the 
limitations  of  quantum  computing,  developing  engineering  tools  to  aid  in  the  construction  of 
prototype  quantum  processors,  and  exploring  possible  applications  of  quantum  techniques.  In 
particular,  we  developed  new  algorithms  for  simulating  quantum  circuits  that  achieve 
polynomial-time  efficiency  in  important  cases,  such  as  all  depth-three  quantum  circuits  and  certain 
approximate  circuits  for  the  Quantum  Eourier  Transform.  These  results  show  that  more 
sophisticated  quantum  circuits  are  required  to  achieve  computational  speed-ups. 
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In  conjunction  with  the  need  to  optimize  quantum  eireuits  to  particular  device  arehitectures,  sueh 
as  linear  ion  traps,  we  developed  several  teehniques  for  quantum  eircuit  verifieation.  One  of  these 
is  based  on  the  quantum  information  deeision  diagram  (QuIDD)  data  strueture  introdueed  in  our 
past  DARPA-funded  project.  This  technique  ean  handle  several  phase-equivalenee  relations 
relevant  to  quantum  eireuits,  but  is  relatively  slow.  Another  teehnique  is  the  use  of  quantum  and 
reversible  miters,  in  eonjunetion  with  eircuit  simplification.  It  can  significantly  reduce  the 
eomplexity  of  quantum  verifieation  instanees  and  is  eompatible  with  other  teehniques,  ineluding 
QuIDD-based  methods.  A  third  technique  is  based  on  Boolean  satisfiability,  and  eapitalizes  on  the 
ongoing  suceess  of  SAT  teehniques  in  industrial  verifieation  of  digital  logic. 

We  also  started  exploring  simulation  of  atomie-seale  systems  represented  by  Ising  models, 
espeeially  in  eonjunetion  with  number-faetoring  through  quantum  adiabatie  optimization.  To  this 
end,  we  developed  two  teehniques  for  energy  minimization  in  Ising  spin-glasses.  One  is  a 
braneh-and-bound  algorithm  that  finds  ground  states  on  100  spins  in  one  day,  and  one  is  a 
loeal-seareh  heuristie  that  approximates  ground  states  on  1,000,000  spins  in  one  day.  Both  ean  be 
easily  parallelized  to  multi-eore  CPUs  and  distributed  systems. 


2.2.  Project  Participants 

Faeulty  at  the  University  of  Michigan  who  led  this  researeh  were  Professor  John  Hayes  and 
Professor  Igor  Markov.  Current  and  former  graduate  students  partieipating  in  this  projeet  at  the 
University  of  Miehigan  ineluded  Dr.  Smita  Krishnaswamy,  Dr.  George  Viamontes,  and  Heetor 
Gareia.  Lastly,  Dr.  Ilia  Polian  from  the  University  of  Freiburg  and  Professor  Shigeru  Yamashita  of 
the  Ritsumeikan  University  eollaborated  with  us. 
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3.  Logic  Circuits  Subject  to  Uncertainty 


3.1.  Summary 

Integrated  eircuits  (ICs)  are  beeoming  inereasingly  susceptible  to  uncertainty  caused  by  soft  errors, 
inherently  probabilistic  devices,  and  manufacturing  variability.  To  address  these  issues,  we  have 
developed  methods  for  analyzing,  designing,  and  testing  circuits  subject  to  probabilistic  effects. 
The  main  contributions  of  this  work  are:  fast,  soft-error  rate  (SER)  analysis  methods  and  software 
tools  that  use  functional-simulation  signatures  to  capture  error  effects,  novel  design  techniques  and 
software  tools  that  improve  reliability  using  little  area  and  performance  overhead,  a  matrix-based 
reliability-analysis  framework  (probabilistic  transfer  matrices  or  PTMs)  that  can  capture  many 
types  of  probabilistic  faults,  and  test  generation  and  compaction  methods  aimed  at  probabilistic 
faults  in  logic  circuits. 

Further  details  concerning  the  material  in  this  chapter  can  be  found  in  Smita  Krishnaswamy’s  2008 
Ph.D.  dissertation  [12]  and  related  publications  [13-21]. 

3.2.  Introduction 

Digital  systems  have  always  been  vulnerable  to  a  variety  of  manufacturing  and  wearout  defects. 
Over  time,  IC  technology  scaling  has  increased  device  sensitivity  to  soft  errors  caused  by  external 
noise  or  radiation  that  temporarily  affects  circuit  behavior  without  permanently  damaging  the 
hardware.  With  the  advent  of  nanoscale  computing,  soft  errors  are  beginning  to  affect  not  only 
memory  but  also  combinational  logic.  Unlike  memory,  errors  in  combinational  logic  cannot  be 
easily  corrected  and  can  lead  to  potentially  disastrous  failures  in  error-critical  systems  such  as 
aircraft,  medical  devices,  and  servers.  New  device  technologies  such  as  carbon  nanotubes  and 
quantum  computers  exhibit  inherently  probabilistic  behavior  due  to  quantum-mechanical  effects. 
Resilience  under  these  sources  of  uncertainty  is  vital  for  continued  technology  and  performance 
improvements.  Due  to  the  high  cost  and  power  consumption  of  ICs,  the  widespread  addition  of 
redundancy  is  not  a  practical  option  for  curtailing  error  rates.  Instead,  careful  circuit  analysis  and 
low-cost  methods  of  improving  reliability  are  necessary.  Further,  circuits  must  be  tested 
post-manufacture  for  their  vulnerability  to  both  transient  faults  and  manufacturing  defects. 

A  soft  error  is  a  signal  that  has  an  incorrect  logic  value  but  does  not  imply  a  permanent  defect. 
Such  errors  are  one  of  the  main  causes  of  uncertainty  and  failure  in  logic  circuits  [34].  They  can  be 
caused  by  cosmic  rays,  a-particles,  and  even  thermal  noise.  When  a  particle  strikes  the  sensitive 
area  of  a  logic  gate,  it  can  cause  an  ionized  track  known  as  a  single-event  upset  (SEU),  as  shown  in 
Figure  1 .  An  SEU  is  a  transient  or  soft  fault,  as  opposed  to  a  permanent  fault.  SEU  effects  do  not 
propagate  if  the  charge  deposited  is  below  the  critical  charge  Qcrit  required  to  switch  the 
corresponding  transistor  on  or  off.  If  an  SEU  deposits  enough  charge  to  cause  a  spurious  signal 
pulse  or  glitch  in  the  circuit,  it  produces  a  soft  error.  Error  propagation  from  the  fault  site  to  a 
flip-flop  or  primary  output  is  stopped  if  there  is  no  sensitized  path  for  error  propagation.  If  a  soft 
error  is  transmitted  to  and  captured  (latched)  by  a  flip-flop,  it  can  persist  in  a  system  for  many 
clock  cycles. 
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A  single  latched  error  can  also  fan  out  to  multiple  flip-flops.  Unlike  errors  in  memory,  errors  in 
combinational  logic  cannot  be  rectified  by  error-correcting  codes  without  incurring  significant 
area  overhead.  Hence,  it  becomes  vital  to  find  ways  to  accurately  analyze  and  decrease  the 
soft-error  rate  of  a  circuit  through  careful  design 


Figure  1.  Ionized  track  in  a  transistor  caused  by  cosmic  radiation  [34]. 


3.3.  Soft-Error  Rate  Estimation 

Several  factors  determine  the  SER  of  a  logic  circuit.  An  SEU  must  have  sufficient  energy  to 
change  a  signal  and  propagate  the  erroneous  signal  value  through  subsequent  gates;  if  not,  the  fault 
is  electrically  masked.  The  signal  change  must  propagate  through  the  logic  to  affect  a  primary 
output;  if  not,  the  fault  is  logically  masked.  The  fault  must  reach  a  flip-flop  during  the  sensitive 
portion  (latching  window)  of  a  clock  cycle;  if  not,  the  fault  is  temporally  masked.  These  various 
masking  effects  depend  on  the  characteristics  of  the  gates  encountered  by  a  fault  on  its  way  to  the 
primary  outputs,  as  well  as  on  the  particular  paths  taken.  Any  path  taken  must  have 
non-controlling  values  on  side  inputs,  so  different  input  vectors  can  sensitize  different  sets  of 
paths. 

The  SER  can  be  computed  using  the  basic  algorithm  shown  in  Eigure  2.  Here,  Perr  is  the 
probability  of  an  error  on  a  signal  node.  It  is  computed  using  the  following  variables:  P{i),  the 
probability  of  vector  i  being  applied  to  the  input;  Pstrike{n),  the  probability  of  a  fault  at  n; 
Pattenuateipathip)),  the  probability  of  attenuation  along  path  p;  and  Piatch(p',o),  the  probability  of  an 
error  on  p  arriving  at  output  o  during  clock  latching.  Neglecting  to  model  any  of  these  factors 
leads  to  overestimation  of  the  SER.  This  algorithm  is  only  practical  for  the  smallest  of  circuits,  as 
the  number  of  possible  sensitized  paths  grows  exponentially  in  the  size  of  the  circuit  [30]. 
Therefore,  even  determining  the  probability  of  logical  masking  is  NP-hard. 
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compute_SElR(circuit  C) 

{ 

tor  (input  vector  i) 
for  (node  /»€C) 
for  (output  o  €  C) 

for(sensitized  path  p£path(i,n)) 
^pmpi'O  =  ( 1  “  fanenuaieip)) 

Perr{C)-\-  =  P{i)Pstrike{n)P prop(fl)Platch{p>^^) 
return  Perr(C) 


Figure  2.  Basic  SER  computation  algorithm. 

Several  software  tools  exist  for  estimating  the  SER  of  eombinational  cireuits.  Soft-error  rate 
analysis  (SERA)  [38]  follows  Eigure  2  and,  using  designer-speeified  input  vectors,  finds  all  paths 
from  each  gate  to  an  output.  SEU-induced  glitches  are  simulated  on  representative  inverter  chains 
of  the  same  lengths  as  the  target  paths  to  determine  the  probability  of  electrical  masking.  East 
analysis  of  soft-error  (EASER)  [39]  uses  binary  decision  diagrams  to  enumerate  all  possible  input 
vectors.  A  BDD  is  created  for  each  gate  in  a  circuit:  a  static  BDD  for  gates  outside  a  glitch’s  cone 
of  influence,  and  duration  and  amplitude  BDDs  for  gates  within  that  cone.  The  BDDs  are 
combined  in  a  way  that  allows  the  width  and  amplitude  of  glitches  to  be  systematically  analyzed 
with  respect  to  electrical  masking.  EASER's  BDD  representations  can  consume  a  lot  of  memory 
space.  Single  event  transient  (SET)  [31]  proceeds  in  topological  order  and  considers  each  gate 
only  once.  Eor  each  gate,  SET  encodes  the  probability  and  shape  of  a  glitch  as  a  Weibull 
probability  density  function  called  a  SER  descriptor  (SERD).  The  Weibull  parameters  are 
modified  at  each  gate  to  account  for  electrical  attenuation,  and  the  new  output  SERDs  are  passed 
on  to  succeeding  gates.  The  SET  algorithm  is  similar  to  static  timing  analysis  (STA)  and  so  does 
not  consider  false  paths.  Eigure  3  summarizes  the  characteristics  of  the  tools  described  above,  as 
well  as  their  methods  for  incorporating  masking  mechanisms.  Because  of  their  different 
assumptions  and  vastly  different  ways  of  computing  SER,  they  can  yield  very  different  SER  values 
for  the  same  circuit. 


.AHrilnite 

SERA 

EASER 

SET 

Logic  masking 
Timing  masking 
Electrical  masking 
Fault  assumptions 

Vector  simulation 

SER  derating 
Inverter-chain  simulation 
Single 

BDD-hased  analysis 
No  details  given 

Gate  characterization 
Single 

Vector  simulation 

SER  derating 

Gate  characterization 
Multiple 

Figure  3.  Summary  of  differeuces  betweeu  three  SER  evaluatiou  tools. 
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Our  work  aims  to  build  SER  analysis  tools  that  are  scalable  and  can  be  used  early  in  the  logic 
design  phase  [15,  18,  19].  Due  to  our  emphasis  on  reliability-driven  logic  design,  we  focus  on 
modeling  logical  masking  both  accurately  and  efficiently.  We  then  use  our  tools  to  guide  several 
design  techniques  to  improve  circuit  resilience  against  soft  errors. 

3.3.1.  Reliable  Design  Methods 

Techniques  for  transient-fault  tolerance  have  been  developed  for  use  at  nearly  all  stages  of  the 
design  flow.  They  rely  on  enhancing  masking  mechanisms  to  mitigate  error  propagation.  Here, 
we  discuss  several  techniques  and  highlight  their  masking  mechanisms.  Faults  can  be  detected  at 
the  architectural  level  via  some  costly  form  of  redundancy  and  can  be  corrected  by  rolling  back  to 
a  checkpoint  to  replay  instructions  from  that  checkpoint.  Techniques  that  involve  replicating  an 
entire  circuit  increase  chip  area  significantly  and,  therefore,  decrease  chip  yield.  For  example, 
Mohanram  and  Touba  [26]  proposed  to  partially  triplicate  logic  by  selecting  regions  of  the  circuit 
that  are  especially  susceptible  to  soft  errors.  Such  regions  are  selected  by  simulating  faults  with 
random  test  vectors.  More  recently,  Almukhaizim  et  al.  [2]  presented  a  design  modification 
technique,  called  rewiring,  to  increase  reliability.  In  the  spirit  of  [2],  our  work  focuses  on 
lightweight  modifications  that  increase  reliability  without  requiring  large  amounts  of  redundancy. 

Chip  manufacturers  routinely  test  their  chips  for  SER  [23,  36,  37].  This  is  normally  accomplished 
through  field  testing  or  accelerated  testing.  In  field  testing,  many  devices  are  connected  to  testers 
and  evaluated  over  several  months  under  normal  operating  conditions.  In  accelerated  testing, 
devices  are  irradiated  with  neutron  or  a-particle  beams,  thus  shortening  test  time  to  a  few  hours. 
There  is  some  difficulty,  however,  in  translating  the  SER  obtained  by  accelerated  testing  to  that  of 
field  testing  [11].  For  instance,  intense  radiation  can  cause  multiple  simultaneous  errors, 
triggering  system  failures  more  often  than  normal. 

As  an  alternative  to  field  testing,  Hayes,  Polian  and  Becker  [9]  propose  a  non-concurrent  built-in 
self-test  (BIST)  architecture  for  online  testing.  They  define  the  impact  of  various  soft  faults  on  the 
circuit  in  terms  of  frequency,  observability,  and  severity.  For  instance,  more  frequent  and 
observable  faults  are  considered  more  influential  than  rare  faults.  With  this  fault  characterization, 
integer  linear  programming  (IFP)  is  used  to  generate  tests  for  various  objectives,  such  as  ensuring 
a  minimum  fault-detection  probability.  Researchers  have  sought  to  accelerate  testing  by 
employing  test  patterns  that  sensitize  faults.  Conceptually,  the  main  difference  between  testing  for 
hard  errors  rather  than  soft  errors  is  that  soft  errors  are  only  present  for  a  fraction  of  the  test  time. 
Therefore,  test  vectors  must  be  repeated  to  detect  faults,  and  they  must  be  selected  to  sensitize  the 
most  frequent  faults.  Sanyal  et  al.  [33]  accelerate  testing  by  selecting  a  set  of  error-critical  nodes 
and  deriving  test  sets  that,  using  IFP,  sensitize  the  maximum  number  of  these  faults.  In  our  work, 
which  preceded  [33],  we  developed  a  way  of  identifying  error-sensitive  test  vectors  for  multiple 
faults,  and  we  devised  algorithms  for  generating  test  sets  to  accelerate  SER  testing  [13,  14]. 
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3.3.2.  Probabilistic  Circuit  Analysis 


Circuit  design  and  testing  calls  for  new  types  of  probabilistic  analysis  that  go  beyond  soft  error 
analysis  only.  In  our  work,  we  developed  a  novel  probabilistic  matrix-based  model  for  gates,  and 
we  use  matrix  operations  and  symbolic  methods  to  evaluate  overall  eircuit  error  probabilities,  as 
diseussed  later.  In  earlier  work,  Bahar  et  al.  [5]  propose  to  model  and  design  earbon  nanotube 
(CNT)-based  neural  networks  using  Markov  random  fields  (MRFs).  MRFs  speeify  joint 
probability  distributions  in  terms  of  loeal  conditional  probabilities,  but  they  can  also  describe 
cyclic  dependencies.  A  neural  network  is  deseribed  by  an  MRF  with  node  values  computed  by  a 
weighted  sum  of  conditional  probabilities  of  a  neighboring  clique  of  nodes.  This  is  known  as  the 
Gibbs  formulation  and  lends  itself  to  optimizing  for  clique  energy,  which  is  translated  into  low 
node  error  probabilities  in  [5].  Related  to  this,  Nepal  et  al.  [27]  present  a  method  for  implementing 
MRF-based  cireuits  in  CMOS,  while  Bhadhuri  et  al.  [4]  deseribe  a  software  tool  ealled  Nanolab, 
whieh  uses  the  algorithm  from  [5]  to  automate  the  design  of  fault-tolerant  architectures. 

In  more  recent  work  than  our’s,  Rejimon  and  Bhanja  [32]  propose  capturing  errors  in  nanoscale 
circuits  by  means  of  Bayesian  networks.  (A  Bayesian  network  is  a  directed  graph  with  nodes 
representing  variables  and  edges  representing  dependenee  relations  among  the  variables.)  If  there 
is  an  edge  from  node  a  to  node  b,  then  a  is  a  parent  of  b.  The  joint  probability  distribution  for  n 
variables  is  represented  as  the  product  of  the  conditional  probability  distributions: 

n"=iP[.v,|/wt7it^{.v,)] 


To  carry  out  numerical  calculations  on  a  Bayesian  network,  eaeh  node  x,  is  labeled  with  a 
probability  distribution,  conditioned  on  its  parents.  The  probability  distribution  ean  be  given  in 
tabular  form.  Primary  inputs  are  given  pre-defined  probabilities  and  the  probabilities  of  other 
nodes  are  then  computed  using  a  method  ealled  belief  propagation.  Joint  probabilities  are 
computed  in  large  Bayesian  networks  using  sampling  methods  such  as  importance  sampling. 
Many  tools  [28]  exist  for  Bayesian  network  analysis. 

3.4.  Signature-based  Soft-error  Analysis 

As  diseussed  above,  it  is  important  to  be  able  to  effieiently  and  accurately  analyze  SERs  during  the 
actual  design  process.  We  now  present  our  SER  analyzer  ealled  analysis  of  soft-error  rate 
(AnSER).  This  tool  uses  functional  simulation  to  estimate  logie  masking  and  to  aecount  for  the 
input-veetor  dependence  in  timing  and  electrieal  masking.  Sinee  exact  analysis  is  impractical  for 
all  but  the  smallest  of  cireuits,  we  estimate  parameters  like  signal  probability  and  observability, 
whieh  are  closely  connected  to  the  probability  of  error  propagation,  using  a  new  and  very  effieient 
signature-based  method. 


8 


Figure  4  illustrates  the  flow  of  eomputation  in  AnSER.  Functional-simulation  signatures  are 
computed  from  logical  information,  error-derating  factors  from  gate-characterization  information, 
and  error-latching  windows  from  static-timing  analysis.  These  smaller  computations  are 
combined  to  form  an  estimate  of  circuit  SER.  Since  AnSER  is  intended  to  be  used  alongside 
logical  and  physical  design  tools,  we  pay  particular  attention  to  runtime,  memory  requirements, 
and  the  incremental-use  model. 


Figure  4.  Overall  design  flow  of  AnSER. 

The  remainder  of  this  chapter  develops  our  method  for  computing  the  SER  of  logic  circuits  by 
accounting  for  logic  masking,  extends  this  methodology  to  sequential  circuits,  and  incorporates 
timing  and  electrical  masking  into  our  SER  estimates.  Further  details  of  the  techniques  and  results 
appear  in  [12]. 

3.4.1.  SER  in  Combinational  Logic 

We  now  consider  a  SER  analysis  method  for  combinational  logic  which,  by  definition,  contains  no 
memory.  We  first  develop  fault  models  for  soft  errors.  Then,  we  provide  background  on 
functional-simulation  signatures,  which  are  used  extensively  in  AnSER.  Next,  we  derive  SER 
algorithms  for  single  and  multiple  fault  assumptions  using  signal  probability  and  observability 
measures  that  are  computed  using  signatures. 
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3.4.2.  Faults,  Signatures  and  Observability  Don’t  Cares 

We  first  formulate  a  model  for  transient  faults  that  is  suitable  for  SER  analysis.  Formally,  a 
transient  stuck-at  (TSA)  fault  is  a  triple  {g,  v,  Perr{g))  where  g  is  a  circuit  node,  v  is  a  logic  value, 
and  Perr{g)  is  the  probability  per  clock  cycle  of  a  stuck- at-0/1  fault  when  the  node  has  correct 
value  V.  The  advantage  of  basing  a  fault  model  on  the  standard  stuck-at  (SA)  model  for  permanent 
faults  is  that  the  same  automatic  test  pattern  generation  (ATPG)  tools  can  be  used  for  SA  and  TSA 
faults.  The  TSA  fault  model  assumes  that  at  most  one  fault  occurs  in  any  clock  cycle.  This 
assumption  is  common  in  SER  research  because  for  most  technologies,  the  intrinsic  error  rate  is 
very  low.  The  contribution  of  each  gate  to  the  SER  depends  on  the  SEEl  rate  of  the  particular  gate, 
as  captured  by  Perr{g)  and  on  the  observability  of  the  error.  The  TSA  model  can  readily  be 
extended  to  several  types  of  multiple  transient  multiple  faults. 

A  signature,  denoted,  sig{g)  =  Fg{X\)  Fg{X2)...Fg{XK),  is  the  sequence  of  logic  values  observed  at 
node  g  in  response  to  applying  a  sequence  of  K  input  vectors  X\,X2..Xk  to  the  circuit.  We  use 
node  signatures  for  three  purposes:  to  compute  the  SER,  to  identify  error-sensitive  areas  of  the 
circuit,  and  to  identify  redundant  nodes  for  resynthesis.  Here,  Fg{Xi)  indicates  the  value  appearing 
at  g  in  response  to  A,  so  the  signature  partially  specifies  the  Boolean  function  Fg  realized  by  g. 
Applying  all  possible  input  vectors  (exhaustive  simulation)  generates  a  signature  that  corresponds 
to  a  full  truth  table.  In  general,  sig{g)  can  be  seen  as  a  kind  of  “supersignal”  composed  of 
individual  binary  signals  that  are  defined  by  some  current  set  of  vectors.  Eike  the  individual 
signals,  sig{g)  can  be  processed  by  EDA  tools  such  as  simulators  and  synthesizers  as  a  single  entity. 
This  processing  can  take  advantage  of  bitwise  operations  available  in  CPUs  to  speed  up  the 
computation  compared  to  processing  the  signals  that  compose  sig{g)  one  at  a  time. 

Signatures  with  thousands  of  bits  can  be  useful  in  pruning  non-equivalent  nodes  during 
equivalence  checking  [29,  40].  A  related  speedup  technique  is  also  the  basis  for  “parallel”  fault 
simulation  [7].  Figure  5  shows  a  5-input  circuit  where  each  of  the  10  nodes  is  labeled  by  an  8-bit 
signature  computed  with  eight  input  vectors.  These  vectors  are  randomly  generated,  and 
conventional  functional  simulation  propagates  signatures  to  the  internal  and  output  nodes.  In  a 
typical  implementation  such  as  ours,  signatures  are  stored  as  logical  words  and  manipulated  with 
64-bit  logical  operations,  ensuring  high  simulation  throughput.  Therefore  64  vector  simulations 
are  conducted  in  parallel  with  each  signature  processed.  Generating  A-bit  signatures  in  an  A-node 
circuit  takes  0{NK)  time. 
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Figure  5.  Signatures,  ODC  masks  and  testability  information  in  a  small  circuit. 

Observability  don't-cares  (ODCs)  occur  at  a  node  g  for  input  vectors  for  which  the  value  at  g  does 
not  affect  the  primary  outputs.  For  example,  in  the  circuit  AND(a;  OR(a,h)),  the  output  of  the  OR 
gate  is  inconsequential  when  a  =  0.  Hence,  input  vectors  00  and  01  are  ODCs  for  b.  Corresponding 
to  the  A-bit  signature  sig(g),  the  ODC  mask  of  g  is  the  A-bit  sequence  whose  z-th  bit  is  0  if  input 
vector  Xi,  is  in  the  don't-care  set  of  g;  otherwise  the  z-th  bit  is  1,  that  is,  ODCmask{g)  =  X\  ^ 
ODC{Fg)  X2  ^  ODC{Fg) . .  Xk  ^  ODC{Fg).  The  ODC  mask  is  computed  by  bitwise  inverting  sig{g) 
and  re-simulating  through  the  fan-out  cone  of  g  to  check  if  the  changes  are  propagated  to  any  of  the 
primary  outputs.  ODC  masks  can  be  computed  exactly  in  time  0{N  )  for  a  circuit  with  N  gates. 
We  found  the  faster  heuristic  algorithm  presented  in  [29],  which  has  only  0{N)  complexity, 
convenient  to  use.  Figure  5  shows  a  sample  8-bit  signature  and  the  accompanying  ODC  mask  for 
each  node  of  the  example  10-node  circuit. 

3.4.3.  SER  Evaluation 

We  compute  the  SER  by  counting  the  number  of  test  vectors  that  propagate  the  effects  of  a 
transient  fault  to  outputs.  Figure  6  summarizes  our  algorithm  for  SER  computation.  It  involves 
two  traversals  of  the  target  circuit:  one  to  propagate  signatures  forward  and  another  to  propagate 
ODC  masks  backwards.  The  fraction  of  Is  in  a  node's  signature  is  an  estimate  of  its  signal 
probability,  while  the  relative  proportion  of  Is  in  an  ODC  mask  indicates  observability.  The  two 
measures  are  combined  to  obtain  a  testability  figure-of-merit  for  each  node  of  interest,  which  is 
then  multiplied  by  the  probability  of  the  associated  TSA  to  obtain  the  SER  for  the  node.  This 
approach  can  be  contrasted  with  technology-dependent  SER  estimates,  which  include  timing  and 
electrical  masking. 
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compute-TSA-SER(Circuit  C,  int  K) 

{ 

compute_sigs(C.  A') 
compute  _odc_appr  ox  (C,  A') 
for  (all  nodes  ,1;  €  C) 
testoij^)  =  zeros{sig{)^)&ODCinask{g))/K 
test\(^^)  =  (Vit’5(~  sig(^)&ODCfmisk{^))/K 
Perr{C)+  =  Perro(y)tt\sti(g) 

Perr{C)+  =  Perry  {}i)t esto{ji) 

Iftuni  Perr{C) 


Figure  6.  Algorithm  to  compute  SER  uuder  the  TSA  fault  model. 


We  estimate  the  probability  P\g  =  1]  of  signal  g  having  logie  value  1  by  the  fraetion  of  Is  in  the 
signature  sig{g);  this  is  often  called  the  controllability  of  g.  The  observability  T’[ofc(g)]  of  g  is  the 
probability  that  a  change  in  the  signal’s  value  changes  a  primary  output,  and  is  approximated  by 
the  number  of  Is  ing’s  ODC  mask.  The  l-testability  of  g  is  the  probability  that  its  correct  value  is 
1  and  that  it  is  observable. 


P\test\{gy\  =  or\Qs{sig{g)8LODCmask{g)IK 

Similarly,  0-testability  is  the  number  of  positions  where  the  ODC  mask  is  1  and  the  signature  is  0. 
Consider  again  the  circuit  in  Figure  5 .  For  node  g  we  have  sig{g)  =  01011011  and  ODCmask{g)  = 
01000100.  Hence,  P\g  =  1]  =  5/8,  P[g  =  0]  =  3/8,  P[obs{g)]=  2/8,  and  P[testk){g)'\  =  P[test\igy\  = 
1/8. 

In  a  circuit  C,  we  sum  the  SER  contributions  of  the  gates,  and  weight  the  gate  error  probabilities  by 
the  testability  for  each  TSA  fault.  Hence,  we  can  write: 

Perr{C)  =  ^  P[testi{gy Perro{g)  P'testo{g)]Perri{g) 
g€C 


For  example,  if  each  gate  in  Figure  5  has  TSA-1  probability  PerrO  =  p  and  TSA-0  probability 
Perrl  =  q,  then  the  SER  is  given  by  Perr{C)  =  2p  +  (13/8)^.  The  metrics  testo  and  test\  implicitly 
incorporate  error  sensitization  and  propagation  conditions.  Hence,  the  last  equation  accounts  for 
the  possibility  of  an  error  being  logically  masked. 
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3.5.  SER  in  Sequential  Logic 


We  can  extend  our  SER  analysis  to  handle  sequential  circuits,  which  contain  state  storage  elements 
(D  flip-flops).  The  combinational  logic  computes  state  information  and  primary  output  values  as  a 
function  of  the  current  state  and  primary  inputs.  Three  factors  to  consider  while  analyzing 
sequential-circuit  reliability  are:  steady-state  probability  distribution,  state  reachability,  and 
sequential  observability. 


3.5.1.  Steady-State  and  Reachability  Analysis 

In  order  to  approximate  the  steady-state  distribution,  we  perform  sequential  simulation  using 
signatures.  Assume  that  a  circuit  with  m  flip-flops  T  =  is  in  state  =  {si,S2,...Sm}.  Our 

method  starts  in  state  for  each  simulation  run  (sets  of  64  states  are  processed  in  parallel),  then 
we  simulate  the  circuit  for  n  cycles.  Each  cycle  propagates  signatures  through  the  combinational 
logic  and  stops  when  flip-flops  are  reached.  Primary  input  values  are  generated  randomly  from 
some  fixed  probability  distribution.  At  the  end  of  each  simulation  cycle,  flip-flop  inputs  are 
transferred  to  flip-flop  outputs,  which  are,  in  turn,  fed  into  combinational  logic  for  the  subsequent 
cycle.  All  intermediate  signatures  are  erased  before  the  next  simulation  cycle  starts.  The  A-bit 
signatures  of  the  flip-flops  at  the  end  of  n  simulations  cycles  define  K  states.  We  claim  that  for 
large  enough  n,  these  states  are  sampled  from  the  steady-state  probability  distribution.  Empirical 
results  suggest  that  most  ISC  AS  benchmarks  reach  steady-state  in  10  or  fewer  cycles,  under  the 
above  operating  conditions  [25]. 

Additionally,  our  signature -based  SER  analysis  methods  can  handle  systems  that  are 
decomposable.  Such  systems  pass  through  some  transient  states  and  are  then  confined  to  a  set  of 
strongly  connected  closed  (SCC)  states.  That  is,  the  system  can  be  partitioned  into  transient  states 
and  sets  of  SCC  states.  Eor  such  systems,  the  steady-state  distribution  strongly  depends  on  the 
initial  states.  We  address  this  implicitly  by  performing  reachability  analysis  starting  at  a  reset  state. 
Thus,  each  bit  of  the  signature  corresponds  to  a  simulation  that  starts  from  a  reset  state  and 
propagates  through  the  combinational  logic,  moves  to  adjacent  reachable  states,  and,  for  a  large 
enough  n,  reaches  steady-state  within  the  partition. 
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a 


b 


Figure  7.  Illustration  of  bit-parallel  sequential  simulation. 


Using  our  algorithm,  simulating  a  circuit  with  g  and  iiT-bit  signatures  for  n  simulation  cycles  takes 
time  0{Kng).  Note  that  it  does  not  require  matrix-based  analysis,  whieh  is  often  the  bottleneck  in 
other  methods.  For  example,  Markov  matriees  are  used  to  encode  state-transition  probabilities 
explicitly,  and  so  can  be  large  due  to  state-spaee  explosion  [25].  Figure  7  shows  an  example  of 
sequential  simulation  with  3-bit  signatures.  The  flip-flops  with  outputs  x  andy  are  initialized  to 
000  in  eycle  TO,  then  the  eombinational  logie  is  simulated.  For  cyele  T\,  the  inputs  ofx  andy  are 
transferred  to  the  output,  and  the  process  continues.  At  the  end  of  the  simulation,  the  values  for  x 
andy  at  73  are  saved  for  sequential-error  analysis,  as  explained  below. 


3.5.2.  Error  Persistence  and  Sequential  Observability 

To  assess  the  impact  of  soft  faults  on  sequential  eireuits,  we  analyze  several  eycles  through  whieh 
faults  persist,  using  time-frame  expansion.  This  means  making  n  eopies  of  the  eireuit  Co,Ci, . . .  C„.i, 
thereby  eonverting  a  sequential  eireuit  into  a  pseudo-combinational  one  with  n  time  frames.  The 
outputs  of  the  flip-flops  of  the  k-th  frame  are  eonneeted  to  the  primary  inputs  of  frame  k  +  1  for  0  < 
k<  n  -  1 .  Flip-flop  outputs  that  feed  into  the  first  frame  (k  =  0)  are  treated  as  primary  inputs,  and 
flip-flop  inputs  of  frame  n  are  treated  as  primary  outputs.  Figure  8  shows  a  three-time-frame 
eireuit  that  eorresponds  to  Figure  7.  Note  how  new  primary  inputs  and  outputs  are  created, 
eorresponding  to  the  inputs  from  flip-flops  for  frame  0  and  outputs  of  flip-flops  for  frame  3. 
Intermediate  flip-flops  are  represented  by  buffers. 
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Frame  0 


Frame  1 


Frame  2 


Figure  8.  Illustration  of  time-frame  expansion  into  three  frames:  Co,  Ci,  Ci. 

Observability  is  analyzed  by  eonsidering  all  n  frames  as  a  single  eombinational  eireuit,  thus 
allowing  the  single-fault  SER  analysis  deseribed  in  the  previous  seetion  to  be  applied  to  sequential 
eireuits.  Other  useful  information,  sueh  as  the  average  number  of  cyeles  during  whieh  faults 
persist,  ean  also  be  determined  using  time-frame  expansion.  After  the  multi-eyele  sequential 
simulation  deseribed  above  we  store  the  signatures  of  the  flip-flops  and  use  signatures  to  stimulate 
the  newly  ereated  primary  inputs  in  the  time-frame  expanded  eireuit.  For  instanee,  the  xo  and  yo 
inputs  of  the  eireuit  in  Figure  4  are  simulated  with  the  eorresponding  signatures,  marked  73  (the 
final  signature  after  multi-eyele  simulation  is  finished).  Randomly  generated  signatures  are  used 
for  primary  inputs  not  eorresponding  to  flip-flops  (sueh  as  ao  and  bo  in  Figure  8). 

After  simulation,  we  perform  ODC  analysis  starting  from  the  primary  outputs  and  flip-flop  inputs 
of  the  n-th  frame  C„  and  moving  all  the  way  to  the  inputs  of  the  0-th  frame.  In  other  words,  errors 
in  primary  outputs  and  flip-flops  are  eonsidered  to  be  observable.  Figure  9  gives  our  algorithm  for 
sequential  SER  eomputation.  The  value  of  n  ean  be  varied  until  the  SER  stabilizes,  i.e.,  does  not 
ehange  appreeiably  from  frame  n  to  frame  n+l.  The  n-frame  ODC-analysis  can  lead  to  different 
gates  being  seen  as  critical  for  SER.  For  instance,  the  designer  can  deem  errors  that  persist  longer 
than  n  cycles  as  more  critical  than  errors  that  are  quickly  flushed  at  primary  outputs.  In  this  case, 
the  ODC  analysis  only  considers  the  fan-in  cones  of  the  primary  outputs  of  C„. 


compute_seq.SER(Circuit  C,int  K ,int  /(,int  /) 

{ 

for(/  <  n) 

seq_simulate(C,A') 

C' =  time_fraiiie_expand(C,/) 
copy_f  lipf  lop_inputs(C',C) 
compute_sigs  (C'.K) 
compute_odc_approx  (.C'.K) 
forCall  nodes  ^  6  Co) 
testo(g)  =  zeros(sig(g)&ODCtnask(g))  /  K 
test\(g)  =  ones(^  sig(g)&.ODCmcisk(g)) / K 
Perr{C')+  =  {Perro(g)test\{g)  +  Perri(g)testo{g)) 
return  Perr(C') 

} 


Figure  9.  Algorithm  to  compute  SER  iu  sequential  circuits  under  TSA  faults. 
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The  SER  algorithm  in  Figure  6  runs  in  linear  time  with  respeet  to  cireuit  size,  sinee  eaeh  simulation 
is  linear  and  ODC  analysis  (even  with  n  time  frames)  runs  in  linear  time  as  well.  To  eapture 
eleetrioal  masking  in  AnSER,  we  derate  gate-error  probabilities  by  a  faetor  dependent  upon  the 
eharaeterization  of  sueeessor  gates.  Previous  research  has  shown  that  electrical  masking 
eliminates  low-energy  SEEis  in  3-4  levels  of  logic  and  has  little  effect  thereafter  [26].  This  implies 
that  considering  paths  of  limited  length  starting  from  the  gate  in  question  is  often  sufficient  to 
approximate  this  effect.  We  have  also  developed  a  linear- time  algorithm  in  the  spirit  of  static 
timing  analysis  for  computing  the  error-latching  window  (EEW)  of  every  gate  in  a  circuit;  see  [12] 
for  details. 

3.5.3.  Empirical  Validation 

We  now  report  some  empirical  results  for  SER  analysis  using  AnSER  and  our  SER-aware 
synthesis  techniques.  The  experiments  were  conducted  on  a  2.4  GHz  AMD  Athlon  4000+ 
workstation  with  2GB  of  RAM.  The  algorithms  were  implemented  in  C++.  For  validation 
purposes,  we  compare  AnSER  with  complete  test-vector  enumeration  using  the  ATPG  tool 
ATALANTA  [22].  We  provided  ATALANTA  with  a  list  of  all  possible  SA  faults  in  the  circuit  to 
generate  tests  in  “diagnostic”  mode,  which  calculates  all  test  vectors  for  each  fault.  We  used  an 
intrinsic  gate-fault  value  of  gerrO  =  gerrl  =  1  x  10^  on  all  faults.  Since  TSA  faults  are  SA  faults 
that  last  only  one  cycle,  the  probability  of  a  TSA  fault  causing  an  output  error  is  equal  to  the 
number  of  test  vectors  for  the  corresponding  SA  fault,  weighted  by  their  frequency.  Assuming  a 
uniform  input  distribution,  the  fraction  of  vectors  that  detect  a  fault  provides  an  exact  measure  of 
its  testability.  Then,  we  computed  the  SER  by  weighting  the  testability  with  a  small  gate  fault 
probability.  While  the  exact  computation  can  be  performed  only  for  small  circuits.  Figure  10 
suggests  that  our  algorithm  is  accurate  to  about  3%  for  2,048  simulation  vectors. 


Circuit 

No.  gates 

ATALANTA 

AnSER 

%  Error 

AnSER  Exact-ODC 

Error 

cl7 

13 

6.96E-7 

6.96E-7 

0.01 

6.96E-7 

0.01 

majority 

21 

6.25E-6 

6.63E-6 

6.05 

6.57E-6 

4.87 

decod 

25 

2.60E-5 

2.62E-5 

0.83 

2.60E-5 

0.83 

bl 

25 

1.28E-5 

1.31E-5 

2.81 

1.27E-5 

0.78 

pml 

68 

2.86E-5 

3.00E-5 

4.70 

2.97E-5 

3.5 

Icon 

80 

5.30E-5 

5.39E-5 

1.67 

5.35E-5 

0.94 

.\2 

86 

3.78E-5 

3.87E-5 

2.38 

3.93E-5 

3.97 

z4ml 

92 

5.29E-5 

5.37E-5 

1.50 

5.41  E-5 

2.20 

parity 

111 

7.60E-5 

7.69E-5 

1.24 

7.71E-5 

1.45 

pcle 

115 

5.38E-5 

5.34E-5 

0.75 

5.35E-5 

0.56 

pclerS 

140 

7.06E-5 

7.24E-5 

2.52 

7.23E-5 

2.41 

mux 

188 

1.58E-5 

1.38E-5 

12.54 

1.63E-5 

3.16 

Ave. 

3.06 

2.65 

Figure  10.  Comparison  of  SER  (FIT)  data  for  AnSER  and  ATALANTA. 
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We  isolate  the  effeets  of  two  possible  sourees  of  inaceuracy:  sampling  inaeeuraey,  and  inaeeuraey 
due  to  approximate  ODC  eomputation.  Sampling  inaeeuraey  is  due  to  the  incomplete  enumeration 
of  the  input  space.  Approximate  ODCs  computed  using  the  algorithm  from  [29]  incur  inaccuracy 
due  to  mutual  masking.  When  an  error  is  propagated  through  two  reconvergent  paths,  the  errors 
may  cancel.  However,  the  results  in  Figure  10  indicate  that  most  of  the  inaccuracy  is  due  to 
sampling,  not  approximate  ODCs.  The  last  two  columns,  corresponding  to  exact  ODC 
computation,  show  an  average  error  of  2.65%.  Therefore,  only  0.41%  of  the  error  is  due  to  the 
approximate  ODC  computation.  On  the  other  hand,  while  enumerating  the  entire  input  space  is 
intractable,  our  use  of  bit-parallel  computation  enables  significantly  more  vectors  to  be  sampled 
than  other  techniques  for  the  same  computation  time. 

To  characterize  the  gates  in  the  circuits  accurately,  we  adapted  data  from  [31],  where  several  gate 
types  are  analyzed  in  a  130nm,  1.2  Vdd  technology  via  simulation  program  with  integrated  circuit 
emphasis  (SPICE)  simulations.  We  use  an  average  SER  value  of  gerrO  =  gerrl  =8x10'^  for  all 
gates.  The  SER  analyzers  from  [31,  38,  39]  report  error  rates  that  differ  by  orders  of  magnitude, 
SERA  tends  to  report  error  rates  on  the  order  of  10'  for  180nm  technology  nodes,  and  EASER 
reports  error  rates  on  the  order  of  10'^  for  lOOnm.  Eurthermore,  although  our  focus  is  logic 
masking,  we  also  approximate  electrical  masking  by  scaling  our  fault  probabilities  at  nodes  by  a 
small  derating  factor  to  obtain  trends  similar  to  those  of  [31].  In  Eigure  1 1,  we  compare  AnSER 
and  SERD  when  computing  SER  for  inverter  chains  of  varying  lengths.  Since  one  path  is  always 
sensitized  in  this  circuit,  it  helps  us  estimate  the  derating  factor. 


Number  of  Inputs 

Figure  11.  SER  trends  on  inverter  chains  produced  by  SERD  and  AnSER. 


Eigure  12  compares  with  the  previous  work  on  some  ISCAS-85  benchmarks.  While  the  runtimes 
in  [8]  include  50  runs,  the  runtimes  in  [31]  are  reported  per  input  vector.  Thus,  we  multiply  data 
from  [3 1]  by  the  number  of  vectors  (2,048)  used  there;  our  runtimes  appear  better  by  several  orders 
of  magnitude.  We  believe  that  this  is  due  to  the  use  of  bit-parallel  functional  simulation  to 
determine  logic  masking,  which  has  a  strong  input-vector  dependency.  Most  other  work  uses  fault 
simulation  or  symbolic  methods. 
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Circuit 

No.  gates 

Time  (s) 

.\nSP:R 

SERDIlOOl 

FASER  (135) 

|24| 

c432 

246 

0.111 

10 

22 

— 

c880 

591 

<0.01 

10 

— 

— 

C13.S5 

746 

0.014 

20 

40 

2.09 

C1908 

760 

0.015 

20 

66 

0.781 

C3540 

1951 

<0.01 

60 

149 

5m42s 

C6280 

4836 

1.00 

120 

278 

— 

Figure  12.  Runtime  comparisons  of  four  SER  analyzers. 

Figure  13  shows  changes  in  SER  when  timing  masking  is  considered.  Incorporating  timing 
masking  into  SER  is  useful  in  guiding  physical  synthesis  operations,  while  considering  logic 
masking  alone  suffices  for  technology-independent  logic  synthesis. 


Circuit 

No. 

gates 

Clock 
Perkxl  (s) 

Logic  SER 
(FIT) 

Tinx' 

(s) 

Timing  SER 
(RT) 

Time 

(s) 

Potential 
%  impros'ement 

aes.core 

20263 

3.68E-07 

0.1634 

6 

9.33E-03 

3 

37.37 

spi 

2998 

3.19E-07 

0.03722 

1 

4.23E-03 

1 

13.28 

S35932 

5343 

6.18E-07 

0.1363 

2 

6.03E-03 

1 

26.73 

S38417 

6714 

3.36E-07 

0.1360 

2 

1.22E-04 

1 

37.83 

tv  80 

6802 

6.79E-07 

0.03602 

2 

2.64E-03 

1 

37.30 

mem  .Ctrl 

11062 

6.44E-07 

0.2183 

2 

8.43E-03 

3 

19.64 

ethemet 

36227 

1.46E-06 

0.7010 

9 

1.31E-04 

9 

91.68 

usb.funct 

10337 

3.06E-07 

0.1832 

3 

8.79E-4 

3 

36.39 

Figure  13.  SER  evaluation  with  logic  and  timing  masking. 

In  summary,  efficient  analysis  methods  are  necessary  for  assessing  and  reducing  the  SER  of  a 
circuit.  AnSER,  our  linear-time  method  for  the  logic-level  soft-error  analysis,  achieves  its  low 
runtimes  by  functional  simulation  signatures,  which  enable  a  fast  and  accurate  method  for 
computing  signal  probability  and  observability,  even  in  the  presence  of  reconvergent  fan-out.  We 
analyzed  sequential  circuits  using  AnSER  and  employing  multicycle  simulation  and  time-frame 
expansion.  In  addition,  we  incorporated  timing  masking  through  error-latching  windows  which 
were  computed  using  timing  analysis  information.  Our  results  on  the  standard  benchmarks 
generally  showed  2  to  3  orders  of  magnitude  speed-up  over  previous  SER  analyzers,  and  high 
accuracy  when  validated  against  the  ATALANTA  ATPG  tool. 

3.6.  Design  for  Robustness 

At  the  gate  level,  soft  errors  have  traditionally  been  eliminated  via  costly  time  or  space  redundancy. 
However,  as  we  show  here,  it  is  possible  to  achieve  major  improvements  in  reliability  without 
resorting  to  massive  redundancy.  In  combinational  logic,  an  SEU  only  affects  the  primary  outputs 
if  it  is  propagated  through  the  intermediate  gates.  A  basic  way  that  designers  can  improve  a 
circuit’s  reliability  is  to  ensure  that  faults  are  logically  masked  with  high  probability.  We  target 
logic  and  timing  masking  to  obtain  soft-error-tolerant  circuits  in  the  following  ways:  1)  by 
identifying  and  using  partial  redundancy  already  present  within  the  circuit,  to  mask  errors;  2)  by 
selecting  error-sensitive  areas  of  the  circuit  for  replication  or  hardening;  3)  by  generating  many 
candidate  rewrites  for  each  subcircuit  and  selecting  among  them  for  improvements  in  area  and 
SER;  and  4)  by  increasing  timing  masking  during  physical  design. 
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3.6.1.  Signature-Based  Design 


We  now  describe  a  technique  called  signature-based  design  for  reliability  (SiDeR),  which 
identifies  redundancy  already  present  in  the  circuit  and  utilizes  it  to  increase  logic  masking.  As 
previously  discussed,  signatures  provide  partial  information  about  the  Boolean  function  of  a  node. 
Therefore,  candidate  nodes  with  similar  functionality  can  be  identified  by  matching  signatures. 
We  exploit  the  fact  that  nodes  need  not  implement  identical  Boolean  functions  to  bolster  reliability. 
Any  node  that  provides  predictable  information  about  another  can  be  used  to  mask  errors.  For 
instance,  if  two  internal  nodes  x  andy  satisfy  the  property  (y  =  1)  ^  (x  =  1),  where  ^  denotes 
“implies”,  theny  gives  information  about x  whenever y  =  1.  More  generally,  if /(xo,xi,  X2,...x„)  = 
X,  then  X  can  be  replaced  by  /  to  logically  mask  errors  that  are  propagated  through  x.  However, 
errors  at  x  are  only  masked  in  cases  where  x  does  not  control / 

We  can  increase  the  number  of  potential  candidates  that  can  replicate  x  by  taking  ODCs  into 
account.  In  terms  of  signatures,  this  corresponds  to  bitwise  ANDing  sig(f)  and  sig(x)  by 
ODCmask{x)  to  check  for  the  following  relation; 

sig(f)&ODCmask{x)  =  sig{x)&ODCmask{x) 

Figure  14(a)  shows  an  example  of  replicated  logic  for  node  a,  derived  by  utilizing  don't-care 
values  and  signatures. 


Figure  14.  (a)  Rewriting  a  sub-circuit  to  improve  area,  and  (b)  Finding  a  candidate  cover  for  a. 


In  order  to  limit  area  overhead,  the  function/ must  be  efficiently  constructed  from  xo,xi,...x„. 
Therefore,  we  only  consider  cases  where /is  implemented  by  a  single  AND  or  OR  gate.  We  add 
redundant  logic  by  transforming  node  x  into  OR(x;  y).  This  means  that  either  (y  =  l)^(x=  l)or 
(x=  l)^(y  =  1),  which  makes  candidate  pairs  x  andy  easy  to  identify.  When  OR(x;  y)  =  x,  it 
follows  that  sig{x)  >  sigiy)  lexicographically;  otherwise,  sigiy)  is  1  in  a  position  where  sig{x)  is  not. 
Therefore,  sorting  the  signatures  can  narrow  the  search  for  candidate  signals  y.  Also,  sig{x)  must 
contain  more  Is  than  sigiy),  so  maintaining  an  additional  list  of  size-sorted  signatures  and 
intersecting  the  two  lists  can  prune  the  search.  Multiple  lexicographical  sorts  and  multiple  size 
sorts  of  signatures  starting  from  different  bit  positions  can  further  narrow  the  search. 
Consequently,  signature-based  redundancy  identification  can  efficiently  perform  logic  implication. 
Generally,  several  candidates  satisfy  implication  relations  for  each  node  x. 
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Among  the  candidates,  we  choose  a  node  y  that  most  often  controls  the  output  of  the  additional 
OR/ AND  gate,  and  whose  fan-in  cone  is  maximally  disjoint  from  that  of  x.  Once  we  find 
candidates  for  resynthesis,  a  SAT  solver  is  used  to  verify  the  implication  relation.  Our  basic 
process  for  verifying  circuit  optimizations  with  SAT  follows  that  of  [6]. 

AnSER  also  makes  use  of  local  rewriting,  a  general  synthesis  technique  that  optimizes  small 
subcircuits  to  obtain  overall  design  improvements.  Rewriting  relies  on  the  fact  that  different 
irredundant  circuits  corresponding  to  the  same  Boolean  function  can  exhibit  different  properties. 
We  optimize  circuits  for  SER  and  area  simultaneously  by  using  AnSER  to  accept  or  reject  rewrites, 
following  the  implementation  of  rewriting  in  [1,  24].  This  technique  first  derives  a  4-input  cut  for 
a  selected  node  defining  a  one-output  subcircuit.  Next,  replacement  candidates  are  looked  up  in 
hash  tables  that  store  several  alternative  implementations  of  each  function.  To  ensure  global 
reliability  improvement,  we  resimulate  the  circuit  and  update  SER  estimates.  Computational 
efficiency  is  achieved  through  fast  incremental  updates.  As  shown  in  Eigure  14(a),  the  original 
subcircuit  with  three  gates  can  be  rewritten  with  just  two  gates. 
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Figure  15.  Improvements  in  SER  obtained  by  SiDeR. 
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3.6.2.  Empirical  Validation 


We  now  report  some  empirieal  results  for  the  various  design  techniques  presented  in  this  section. 
Figure  15  shows  SER  and  area  overhead  improvements  obtained  by  SiDeR.  The  first  set  of  results 
is  for  exact  implication  relationships,  i.e.,  not  considering  ODCs.  The  second  column  shows  the 
use  of  ODCs  to  increase  the  number  of  candidates.  In  both  cases,  AND/OR  gates  are  added  based 
on  the  functional  relationship  satisfied.  We  see  an  average  29%  improvement  in  SER  with  only 
5%  area  overhead  without  ODCs.  The  improvements  for  the  ODC  covers  are  40%  with  area 
overhead  of  13%,  suggesting  a  greater  gain  per  additional  unit  area  than  the  partial  triple  modular 
redundancy  (TMR)  techniques  in  [26],  which  achieve  a  91%  improvement  but  increase  area  by 
104%  on  average. 

Eigure  16  illustrates  the  use  of  AnSER  to  guide  the  local  rewriting  method  implemented  in  the 
ABC  logic-synthesis  package  [1].  AnSER  calculates  the  global  SER  impact  of  each  local  change 
to  decide  whether  or  not  to  accept  the  change.  After  checking  hundreds  of  rewriting  possibilities, 
those  that  improve  SER  and  have  limited  area  overhead  are  retained.  The  data  indicate  that,  on 
average,  SER  decreases  by  10.7%,  while  area  decreases  by  2.3%.  Eor  instance,  for  alu4,  a  circuit 
with  740  gates,  we  achieve  29%  lower  SER  while  reducing  area  by  0.5%.  Although  area 
optimization  is  often  thought  to  hurt  SER,  these  results  show  that  carefully  guided  logic 
transformations  can  eliminate  this  problem. 


21 


Circuits 

No. 

gates 

No. 

rewrites 

SER 

decrease 

Area 

decrease 

Time 

(s) 

alu4 

740 

13 

29.3 

0.5 

24.5 

bl 

14 

0 

0.0 

0.0 

0.2 

b9 

114 

8 

6.8 

0.9 

0.3 

Cl  355 

536 

97 

1.2 

9.0 

37.6 

C3540 

1055 

23 

5.8 

0.9 

51.5 

C432 

215 

68 

5.5 

1.4 

12.1 

C499 

432 

37 

0.0 

0.5 

13.0 

C880 

341 

7 

0.2 

0.0 

5.4 

cordic 

84 

5 

1.2 

1.2 

0.5 

dalu 

1387 

58 

24.0 

3.2 

35.0 

des 

4252 

282 

11.2 

0.1 

12.3 

frg2 

1228 

96 

27.9 

2.0 

8.9 

ilO 

2824 

143 

5.0 

0.6 

16.7 

i9 

952 

83 

31.4 

11.7 

35.3 

.\ve. 

10.7 

2.3 

IS.l 

Figure  16.  Improvements  in  SER  obtained  with  local  rewriting. 


3.7.  Probabilistic  Transfer  Matrices 

We  now  move  to  a  more  general  reliability-analysis  framework  that  treats  circuits  entirely 
probabilistically.  While  this  is  useful  for  analyzing  soft  errors,  it  is  also  useful  for  analyzing 
devices  that  periodically  fail  or  behave  probabilistically  during  regular  operation.  Examples  of 
such  devices  include  probabilistic  CMOS,  molecular  logic  circuits,  and  quantum  computers.  In 
general,  accurate  reliability  analysis  involves  computing  not  just  a  single  output  distribution  but, 
rather,  the  output  error  probability  for  each  input  pattern.  In  cases  where  each  gate  experiences 
input-pattern  dependent  errors,  even  if  the  input  distribution  is  fixed,  simply  computing  the  output 
distribution  does  not  give  the  overall  circuit  error  probability.  For  instance,  if  an  XOR  gate 
experiences  a  bit-flip  error,  then  the  output  distribution  is  unaffected,  but  the  wrong  output  is 
paired  with  each  input.  Therefore,  we  need  to  separately  compute  the  error  associated  with  each 
input  vector. 
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3.7.1.  PTM  Basics 


We  analyze  non-deterministic  circuit  behavior  using  a  representation  we  introduced  [20, 21]  called 
the  probabilistic  transfer  matrix  (PTM).  A  PTM  for  a  gate  or  circuit  gives  the  probability  of  each 
output  combination,  conditioned  upon  the  input  combinations.  PTMs  can  model  gates  exhibiting 
varying  input-dependent  error  probabilities.  PTMs  form  an  algebra,  that  is,  a  set  closed  under 
specific  operations,  where  the  operations  in  question  are  matrix  multiplication  and  tensor  products. 
These  operations  may  be  used  to  compute  overall  circuit  behavior  by  combining  gate  PTMs  to 
form  circuit  PTMs.  Matrix  products  capture  serial  connections,  and  tensor  products  capture 
parallel  connections.  Also,  PTM-based  computations  implicitly  capture  signal  correlations  that 
are  caused  by  fan-out. 
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Figure  17.  (a)  Two-input  AND  gate,  (b)  Its  ITM,  and  (c)  A  PTM  with  various  error  probabilities  for  each  input 

vector. 


A  PTM  for  an  n-input  m-output  component  is  a  2”  x  2“  matrix  M  whose  element  is  the 
probability  of  output  j  occurring  in  response  to  input  i.  The  PTM  of  a  fault-free  component  is 
called  its  ideal  transfer  matrix  (ITM)  and  the  probability  of  every  correct  output  value  is  1 .  Figure 
17  shows  a  two-input  AND  gate,  its  ITM,  and  a  PTM  with  different  probabilities  of  erroneous 
output  for  each  input  combination.  For  example,  the  probability  that  {a,  b}  =  (1,1)  yields  the 
wrong  output  c  =  0  is/>4. 

PTM  algorithms  involve  several  types  of  matrix  operations,  one  of  which  is  the  tensor  product. 
Given  ^  m  matrix  A  and  a  p  ^  q  matrix  B,  their  tensor  product  MT  =  A  ®  B  is  an  np  ^  mq 
matrix  whose  elements  are: 

MT{io  ...  in+p-h  jo  ••jm+q-l)  AilQ..  An-1,  io--jm-i)  ^  Bijn-.An+p-l  •••jtn+q-i) 

The  basic  tensor  operation  on  A  and  B  needs  nmpq  scalar  multiplications,  and  consumes  nmpq 
memory  for  storing  the  results,  for  example. 
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Next  we  describe  how  the  PTM  of  a  />-level  combinational  circuit  can  be  constructed  from  the 
PTMs  of  its  component  gates  and  wires.  First,  derive  ITMs  or  PTMs  for  all  components.  Then  for 
each  topological  level  i  containing  k  component  PTMs  {My},  form  the  level  PTM  Mi  =  Mn  ®  Ma 
0  ...  Mik  by  repeated  application  of  the  tensor  product.  Finally,  using  ordinary  matrix 
multiplication  multiply  all p  level  PTMs  together  to  form  the  circuit  PTM  M  =  M\-M2  •...  Mp. 


Figure  18.  Circuit  demoustratiug  PTM  coustructiou;  dashed  hues  euclose  fauout  gates. 

The  six-level  circuit  C  in  Figure  18  shows  how  a  circuit  PTM  is  constructed.  First,  insert  explicit 
wiring  and  fanout  “gates”  into  C  as  needed.  Then  construct  level  PTMs  MilMgfor  each  level  of 
logic.  The  PTM  of  gate  gi  is  denoted  by  G,;  PTMs  of  a  single  wire  and  an  ^-branch  fanout  gate  are 
denoted  by  the  identity  matrix  h  and  F„,  respectively  in  the  following  symbolic  representations: 
M I  =  I2  ®  I2®  F2  ®  I2®  I2I  M2  =  I2  ®  Gj  0  G2  0  Gj  M3  =  I2  ®  G3  0 1 2',  M4  =  I2  ®  F2  0  Gj  A/5  =  G4 
®  G 5,  and  A/e  =  G^.  The  final  circuit  PTM  is: 

M  =  M I  •  M2  ■  M3  ■  M4  •  M3  •  Mfj 

which  corresponds  to  the  32  x  2  matrix  outlined  in  Figure  19. 

Once  the  overall  PTM  is  known,  output  signal  probabilities  can  be  calculated  very  easily  by 
multiplying  the  input  signal  distribution  (row)  vector  Fby  the  circuit  PTM  A/ thus:  J=  V  ■  M.  For 
example,  if  all  gates  in  C  have  the  same  error  probability,  0.1,  and  all  input  signal  probabilities  are 
0.5,  the  output  probability  of  a  1  is  0.81. 
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Figure  19.  PTM  structure  of  the  circuit  iu  Figure  18. 
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Directly  constructing  a  circuit  PTM  from  its  level  PTMs  may  consume  a  large  amount  of  runtime 
and  memory.  To  enhance  the  scalability  of  PTM  algorithms,  we  proposed  using  algebraic  decision 
diagrams  (ADDs)  to  reduce  the  memory  needed  for  storing  matrices,  and  developed  heuristics 
such  as  dynamic  evaluation  ordering  and  hierarchical  estimation  to  avoid  unnecessary  matrix 
operations  [12],  Such  heuristics  can  reduce  the  computational  complexity  by  several  orders  of 
magnitude.  However,  the  modified  PTM  calculations  are  still  restricted  to  relatively  small  circuits 
because  they  still  construct  the  circuit  PTM  by  multiplying  level  PTMs.  In  addition,  PTM  size  is 
limited  by  the  number  of  inputs  and  outputs  of  the  circuit,  so  deriving  a  PTM  from  a  large  circuit 
without  a  simplification  scheme  such  as  circuit  partitioning  may  be  impractical. 

3.7.2.  Applications 

Besides  the  basic  operations  of  matrix  multiplication  and  tensor  product,  we  introduced  the 
following  three  operations  to  increase  the  scope  and  efficiency  of  PTM-based  computation: 

eliminate  variables:  This  computes  the  PTM  of  a  subset  of  inputs  or  outputs,  starting  from  a  given 
PTM.  It  can  also  be  used  to  compute  the  probability  of  error  of  individual  outputs. 

eliminate  redundant  variables:  This  eliminates  redundant  input  variables  that  result  from  tensoring 
matrices  of  gates  that  are  in  different  fan-out  branches  of  the  same  signal. 

fidelity:  This  measures  the  similarity  between  an  ITM  and  a  corresponding  PTM.  It  is  used  to 
evaluate  the  reliability  of  a  circuit. 

The  PTM  model  can  represent  a  wide  variety  of  faulty  circuit  behaviors,  including  both  hard  and 
soft  errors.  The  fact  that  there  are  separate  probabilities  for  each  input  and  output,  and  the  fact  that 
they  are  propagated  simultaneously  make  this  possible.  Figure  20  lists  some  error  types  that  can 
be  precisely  represented  by  PTMs. 
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Figure  20.  PTMs  for  several  error  types:  (a)  Fault  free  2-1  MUX,  (b)  First  iuput  stuck-at  1,  (c)  Two  iuputs 
swapped,  (d)  Probabilistic  output  bit  flip  with 7j=0.05,  aud  (e)  MUX  replaced  by  XOR. 


We  can  also  use  PTMs  to  derive  polynomial  approximations  for  circuit  error  probabilities  in  terms 
of  gate  error  probabilities  for  the  purpose  of  determining  thresholds  of  acceptable  gate  error  for 
specific  circuits  [12]. 
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3.7.3.  Computing  with  PTMs 


Circuit  PTMs  have  exponential  spaee  complexity  beeause  they  eontain  information  about  all 
possible  input  vectors.  This  complexity  makes  direct  numerical  computation  with  PTMs 
impractical  for  circuits  with  more  than  about  15  inputs.  In  order  to  improve  sealability,  we 
developed  an  implementation  of  the  PTM  framework  that  uses  algebraic  decision  diagrams  to 
compress  matrices.  We  also  derived  several  ADD  algorithms  to  combine  PTMs  direetly  in  their 
compressed  forms.  Figure  21(b)  gives  a  PTM  for  the  cireuit  in  Figure  21(a)  representing  the  case 
where  all  gates  experience  output  bit-flips  with  probability  p  =  0.05.  Figure  21(c)  shows  the 
corresponding  ADD.  As  the  latter  figure  indicates,  the  same  values  oecur  multiple  times  in  the 
matrix  and  suggest  a  possibility  of  compression.  Due  to  the  canonicity  of  ADD/BDD 
representation,  identical  subgraphs,  corresponding  to  identical  submatrices,  can  be  automatically 
identified  and  eliminated  during  the  proeess  of  ADD  eonstruction.  In  some  eases,  ADDs  eontain 
exponentially  fewer  nodes  than  the  number  of  entries  in  the  explicit  matrix  representation.  In  such 
cases,  linear-algebraic  transformations  can  be  applied  exponentially  faster  to  the  ADD  than  to  the 
matrix. 
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Figure  21.  (a)  Sample  logic  circuit,  (b)  PTM  where  each  gate  experieuces  error  with  probability /j=0.05,  aud  (c) 

ADD  eucodiug  of  the  PTM. 


Developing  efficient  ADD  algorithms  for  PTM  operations  is  a  significant  technical  challenge  that 
we  have  addressed.  We  adapted  previous  ADD  algorithms  from  [3]  and  [35]  for  tensor  and  matrix 
products.  The  original  versions  of  their  algorithms  handle  only  square  matriees,  while  PTMs  are 
generally  reetangular.  In  addition,  we  developed  ADD  algorithms  for  the  new  PTM  operations 
defined  earlier.  These  operations  are  needed  for  eomputing  marginal-probability  distributions, 
reeonciling  dimensions,  and  estimating  overall  circuit-error  probabilities.  For  details  of  this  work, 
see  [12]. 
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3.8.  Testing  for  Probabilistic  Faults 


Circuits  have  to  be  tested  in  order  to  ensure  that  their  soft  error  rates  do  not  exceed  an  aeeeptable 
threshold.  To  estimate  the  expected  soft  error  rate  in  the  field,  ehips  are  typieally  exposed  to 
intense  beams  of  protons  or  neutrons  and  the  resulting  error  rate  is  measured.  However,  these 
types  of  tests  take  a  long  time  to  conduct  because  random  patterns  may  not  be  sensitive  to 
vulnerabilities  in  the  circuit.  We  have  developed  methods  for  seleeting  test  vectors  such  that  test 
application  time  is  minimized  [13,  14]. 

Generating  tests  for  probabilistic  faults  is  fundamentally  different  from  existing  testing  techniques. 
Probabilistic  testing  requires  a  multiset  (a  set  with  repetitions)  of  test  patterns,  sinee  a  given  fault  is 
only  present  for  a  fraetion  of  the  computational  cycles.  Another  difference  is  that  some  test  vectors 
detect  transient  faults  with  higher  probability  than  others  due  to  path-dependent  effects  like 
electrical  masking.  Therefore,  one  can  consider  the  likelihood  of  detection,  or  the  sensitivity,  of  a 
test  veetor  to  a  fault. 

3.8.1.  Test-Vector  Sensitivity 

The  sensitivity  sens{F,  t)  of  a  test  vector  t  to  a  multi-fault  set  F  =  {/}}  which  occurs  with  probability 
P  =  \pi}  in  circuit  C  with  PTM  Mp  and  ITM  M  is  defined  as  the  total  probability  that  the  output 
under  t  is  erroneous.  Test  t  ean  be  represented  by  the  veetor  Vt,  with  O's  in  all  but  the  index 
corresponding  to  fs  input  assignments.  For  instanee,  if  t  assigns  Os  to  all  input  signals  and  C  has  3 
inputs,  then  v^=[l  000000  0].  The  sensitivity  of  t  is  the  probability  that  the  ideal  and  faulty 
outputs  are  different,  and  it  can  be  computed  by  taking  the  norm  of  the  element-wise  product  of  the 
eorrect  and  faulty  output  veetors.  This  operation  is  similar  to  our  fidelity  operation  defined  for 
veetors  rather  than  matriees,  and  the  computation  takes  the  form; 

sens{F,  t)=  1  -  1 1  ( v,M f) .  *  {v,M)  \  |/j 


A  second  method  of  sensitivity  computation  begins  with  a  pre-selected  complete  set  of  test  vectors 
for  the  permanent  stuek-at  faults  eorresponding  to  those  in  F.  For  each  test  vector  in  this  set,  we 
eompute  the  faulty  output  at  eaeh  gate  using  vector-PTM  multiplieation  through  intermediate 
gates.  We  also  eompute  the  ideal  output  at  eaeh  gate.  The  ideal  output  is  v^M,  and  the  faulty  output 
vector  is  VtMf.  The  advantage  of  this  method  is  that  we  do  not  have  to  explicitly  compute  the 
eireuit  PTM  and  ITM,  steps  whieh  are  computationally  expensive. 

A  caveat  in  output  vector  computation  is  that  fan-out  branches  result  in  inseparable  probability 
distributions  of  the  braneh  signals.  If  these  signals  are  marginalized  or  treated  as  separate,  then 
inaccuracies  can  occur  in  the  output  probabilities.  A  simple  method  of  handling  this  problem  is  to 
jointly  store  the  probabilities  of  these  signals  and  then  enlarge  any  gate  PTM  the  signals  encounter. 
We  accomplish  gate  enlarging  by  adding  inputs  to  the  gate  that  pass  through  unchanged,  i.e., 
tensoring  the  gate  matrix  with  an  identity  matrix  I. 
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compute-faulty.output  (Circuit  C,  testvector  T) 

{ 

f»r(all  inputs  /  €  C) 
v't’cro/-[/]  =  create -row.vector  (7  [/]) 
insert^fanout  gates  (C) 
sort-topological  (C) 
lor  (each  node  gEC) 
tor(each  input  j  E  iiiputs(g)) 
inpiirvecror[g]  =  inpurvcctor[g\  iSiPTM[j] 
enlarge  (^'.5/-<'(y)  -  1) 
oiirpiitvector[g]  =  iitpt(t\'ector[g]  xP7A'/[^'] 
rnr(each  outputs  o  E  ouTpi(ts{g)') 
vrcr<7/[o]  =  otirpufvector[g] 

} 


Figure  22.  Algorithm  for  output  computatiou. 


An  algorithm  for  computing  the  output  of  a  test  vector  under  probabilistie  faults  (eneoded  in  gate 
PTMs)  is  shown  in  Figure  22.  The  primary  input  values  determined  by  the  given  test  vectors  are 
eonverted  into  input  veetors.  Then,  in  topologieal  order,  the  inputs  for  eaeh  gate  are  tensored 
together  to  form  the  input  veetor  for  the  gate.  If  any  of  the  input  signals  are  stored  jointly  with 
other  signals,  the  gate  in  question  is  enlarged  by  the  number  of  additional  signals.  The  gate  PTM  is 
multiplied  by  the  input  veetor  to  obtain  the  output  veetor.  In  the  ease  of  a  multiple-output  gate 
sueh  as  a  fan-out  gate,  the  output  veetor  stays  as  a  joint  probability  distribution.  In  praetiee,  output 
distributions  ean  beeome  very  large,  through  the  aeeumulation  of  eorrelated  signals.  However,  the 
joint  signals  can  be  separated  by  using  the  eliminate  variables  operation,  whieh  may  entail  some 
loss  of  aeeuracy. 

This  proeess  ean  be  repeated  with  gate  ITMs  (or  funetional  simulation)  to  obtain  the  ideal  output 
veetor.  Finally,  test  vector  sensitivity  is  eomputed  aeeording  to  the  foregoing  equation  for 
sens{F,t)  using  the  fidelity  operation  applied  to  the  ideal  and  faulty  primary-output  veetors. 

3.8.2.  Test  Generation 

We  use  the  test-vector  sensitivity  information  eomputed  in  the  previous  seetion  to  generate 
eompact  multisets  of  test  vectors  for  detecting  transient  faults.  Test-set  eompaetion  is  elosely 
related  to  the  standard  SET  COVER  problem  [10].  In  that  problem,  elements  of  a  set  S  are  to  be 
eovered  by  subsets.  A  minimal  set  of  subsets  must  be  ehosen  sueh  that  every  member  of  S  belongs 
to  at  least  one  of  the  ehosen  subsets.  In  the  context  of  test  generation,  the  set  S  eonsists  of  all 
possible  faults,  and  eaeh  test  veetor  represents  a  subset  of  faults,  namely  the  subset  of  faults  that  it 
deteets.  When  testing  for  soft  errors,  tests  may  have  to  be  repeated  to  inerease  the  probability  of 
fault  deteetion,  therefore  multisets  of  tests  are  seleeted. 
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This  connection  between  SET  COVER  and  test  compaction  allows  us  to  modify  algorithms 
designed  for  SET  COVER  and  introduce  related  lEP  formulations  whose  linear  programming  (EP) 
relaxations  can  be  solved  in  polynomial  time.  Eurthermore,  modifying  the  test-multiset  objective 
simply  amounts  to  altering  the  ILP  objective  function. 

Suppose  a  single  fault /in  a  circuit  C  has  an  estimated  probability  p  of  occurrence.  We  confirm  its 
probability  as  follows: 

1 .  Derive  a  test  vector  t  with  high  sensitivity  sensifj). 

2.  Apply  tk=  \_llsens{f,t)\  times  to  C  for  one  expected  detection. 

3.  If  we  have  d(J)  »  1  detections,  we  conclude  that  the  actual  probability  of /is  higher  and  reject 
the  estimated  probability.  We  can  estimate  the  probability  that  there  are  d(J)  detections  in  k  trials 
using  the  binomial  theorem.  If  the  probability  of  d(J)  detections  is  low,  then  it  is  likely  that  the 
actual  sensitivity  sensifj)  is  higher  than  the  estimate. 

4.  If  sensifd)  is  higher  than  estimated,  we  can  update  our  estimate  and  repeat  this  process. 

We  can  extend  the  above  method  to  multiple  faults  under  two  different  assumptions:  1)  there  are 
several  probabilistic  faults  but  the  circuit  experiences  only  a  single  fault  in  a  clock  cycle,  and  2) 
each  circuit  component  has  an  independent  fault  probability,  implying  that  multiple  faults  at 
different  locations  can  occur  in  the  same  clock  cycle.  The  goal  in  either  case  is  to  pick  a  multiset  of 
vectors  T' taken  from  T  such  that  is  minimal.  Recall  that  each  test  vector  U  represents  a  subset 
of  F,  i.e.,  each  test  vector  detects  a  subset  of  faults.  Under  assumption  1,  we  minimize  the  size  of 
the  multiset  by  using  test  vectors  that  are  either  especially  sensitive  to  one  fault  or  somewhat 
sensitive  to  many  faults.  Therefore,  to  achieve  a  given  detection  probability  of pth  we  need  n  tests, 
where  n  satisfies  (1  -  pf  <  1  -  pth-  Eigure  23  gives  the  greedy  algorithm  for  generating  such  a 
multiset  of  test  vectors,  starting  from  a  compacted  set  of  test  vectors.  Our  lEP  formulation  for 
minimal  test  multiset  generation  is  shown  in  Eigure  24. 
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select_test_multiset  (faults  F ,  tests  T ,  prob  />,/,) 

{ 

UF  =F 

while  ( !  isempty  (6'F) ) 

T max  =  f  ind  jmaximal-test  (UF.  T,  p,i, ) 
add  selected  test  (ST,  T max) 

UF  —  remove_new.covered((/F.  57, />,/,) 

ivtiirn  ST 

remove_new_covered(f  aults  UF ,  test  57,  prob  />,/,) 

{ 

fnr(each  fault  f€UF) 

IbrCeach  test  fG57) 

Pdet+  =  n,(l  —  sens(f.t)) 
if(l  —Pdet  =  /),;,) 
remove_f  ault  (U  F.  f) 

IV turn  UF 

} _ 

Figure  23.  Greedy  algorithm  for  minimizing  the  number  of  test  vectors  (with  repetition)  required  for  fault 

detection. 

Figure  25  shows  the  number  of  test  vectors  required  to  detect  probabilistic  stuck-at  faults  using  the 
method  of  Figure  23,  and  assuming  probability  =  0.05.  Rand  is  the  average  number  of  test 
vectors  selected  during  random  test  generation.  These  results  show  that  our  algorithm  requires  53 
to  64%  fewer  test  vectors  than  random  selection,  even  with  a  small  complete  test  vector  set 
(generated  by  ATALANTA)  used  as  a  base  set. 


Figure  24.  ILP  formulations  for  test  set  generation  with  a  fixed  number  of  expected  detections:  (a)  To 
minimize  the  number  of  test  vectors,  and  (b)  To  maximize  fault  resolution. 
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Figure  25.  Number  of  test  vectors  required  to  detect  iuput  sigual  faults  with  various  threshold  probabilities p,h. 

Once  a  multiset  of  test  veetors  is  generated,  the  aetual  probability  of  error  ean  be  estimated  using 
Bayesian  learning.  This  well-established  artifieial  intelligenee  (AI)  teehnique  uses  observation 
(data)  and  prior  domain  knowledge  to  prediet  future  events.  In  our  ease,  the  prior  domain 
knowledge  is  the  expeeted  or  modeled  fault  probabilities  in  a  eireuit,  and  the  data  eomes  from 
testing. 
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4. 


Wireless  Network  Optimization 


4.1.  Summary 

Mobile  ad  hoc  networks  (MANETs)  are  wireless  communication  networks  which  are  of  interest 
because  of  their  flexibility  and  ease  of  deployment.  MANET  nodes  are  often  powered  by  batteries, 
whose  replacement  is  difficult.  Internode  transmission  power  thus  constrains  the  network 
topology  and  the  topology  changes  continuously  due  to  mobility.  Hence,  understanding  node 
mobility  and  efficiently  managing  transmission  power  are  essential  for  successful  network 
operation.  Eirst,  we  analyze  mathematical  models  of  node  movement  and  propose  a  new  metric  to 
quantify  mobility.  Existing  network  control  algorithms  are  usually  evaluated  using  random 
mobility  models.  However,  since  such  models  employ  incompatible  mobility  parameters,  it  is 
hard  to  compare  the  performance  of  different  algorithms.  We  show  that  link  duration  has  a  nearly 
invariant  relationship  with  route  lifetime  regardless  of  the  adopted  mobility  model,  and  so  is  a 
good  mobility  metric.  Second,  we  investigate  the  issues  of  power  control  and  link  maintenance. 
Existing  power  control  schemes  are  mainly  intended  for  (pseudo)  static  networks,  and  their 
effectiveness  in  highly  mobile  networks  has  not  been  demonstrated.  We  develop  a  novel  algorithm, 
which  adaptively  controls  transmission  power  and  substantially  reduces  communication  power. 
We  analyze  the  impact  of  medium  access  control  on  performance,  and  show  that  the  widely  used 
request  to  send/clear  to  send  (RTS/CTS)  handshake  protocol  may  adversely  affect  the  network 
throughput.  We  further  present  a  way  to  maximize  the  network  throughput.  Third,  we  investigate 
the  problem  of  optimally  placing  base  station  and  relay  nodes  to  reduce  power  consumption  and 
improve  performance.  We  apply  non-linear  optimization  techniques  to  node  placement,  and 
present  distributed  node  placement  techniques  which  place  nodes  among  radio  obstacles  to 
minimize  energy  use.  Simulation  results  confirm  that  the  efficiency  of  the  proposed  algorithms  is 
comparable  to  that  of  an  existing  centralized  algorithm. 

Eurther  details  concerning  the  material  in  this  chapter  can  be  found  in  Sungsoon  Cho’s  2009  Ph.D. 
dissertation  [52]  and  related  publications  [53,  54,  55]. 

4.2.  Introduction 

A  MANET  is  a  multi-hop,  wireless  network  consisting  of  a  set  of  interacting  hosts  or  nodes  that 
move  through  space.  Its  many  applications  include  networks  for  sensing  the  environment,  vehicle 
tracking  systems,  and  emergency  communications  in  a  disaster  area;  see  Eigure  26.  When  a  node 
has  to  send  a  message  to  another  node,  the  sender  can  either  directly  transmit  the  message  to  the 
recipient,  or  transmit  the  message  to  immediate  nodes  which  relay  the  message  to  the  final 
destination.  MANET  operation  differs  from  traditional  wired  networks  in  several  respects.  The 
network  topology  constantly  changes  due  to  node  movement.  The  links  between  node  pairs  can  be 
created  or  deleted  by  adjusting  the  transmission  power  of  the  nodes.  The  communication  medium 
(air)  is  shared  by  multiple  hosts,  so  the  transmitted  data  can  be  garbled  or  lost  if  channel  access  is 
not  controlled  appropriately.  Eor  these  reasons,  efficient  operation  of  a  MANET  poses  some 
unique  challenges.  Our  research  has  focused  on  the  following  issues:  the  impact  of  node  mobility 
on  network  topology,  transmission  power  control,  and  medium  access  control. 
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Application 

Objective 

Hmcrgency  networks 

To  provide  connectivity  Ix'twc'en  distant  devices 
where  the  network  inrrastructure  is  unavailable 

Sensor  networks 

To  monitor  environmental  conditions  over  a  large  area 

Vehicular  ail  hoc  networks 

To  enable  real-time  vehicle  monitoring  and  adaptive 
tratfic  control 

Personal  area  netwoi  ks 

To  provide  Ilexible  connectivity  between  ]X'rsonal 
electronic  devices  or  home  appliances 

Figure  26.  Examples  of  MANET  applications. 

Much  prior  research  has  investigated  ways  to  access  the  Internet  using  open  aeeess  points. 
Balasubramanian  et  al.  [43]  proposed  opportunistie  connection  techniques  which  enable  web 
search  in  vehicles  in  urban  areas  where  only  intermittent  connectivity  is  available.  Banerjee  et  al. 
[4]  proposed  an  energy-efficient  data-forwarding  architecture  based  on  fixed  and  battery-powered 
data  centers,  whieh  store  and  forward  packets  according  to  connection  availability.  However,  a 
serious  limitation  with  these  teehniques  is  that  eonneetivity  is  available  only  when  open  aeeess 
points  exist  nearby.  To  monitor  environmental  conditions  over  a  large  area,  sensor  networks  ean 
be  used  [51,  56].  For  example,  at  Great  Duck  Island,  Maine,  32  wireless  sensor  nodes  are 
deployed  [71]  in  a  way  that  is  manageable  via  the  Internet;  this  habitat-monitoring  network  is 
expected  to  last  9  months  with  two  AA  batteries.  In  [83],  a  mobile  sensor  deployment  project  is 
presented,  which  delivers  wireless  sensors  to  a  road  using  global  positioning  system  (GPS) 
controlled  unmanned  aerial  vehicles  (UAVs).  The  sensor  nodes,  controlled  by  TinyOS  [84], 
constitute  a  multi-hop  communication  network,  track  nearby  vehicles  passing  on  the  road,  and 
report  tracking  data  to  a  base  station  via  a  UAV.  An  airborne  communication  network  is  another 
promising  MANET  application  [86]. 

Next,  we  summarize  the  terminology  and  formal  network  models  adopted  in  our  work.  Two  nodes 
are  eonnected  if  and  only  if  they  can  directly  communicate  with  each  other.  A  network’s 
connectivity  or  topology  is  represented  by  a  graph  G  =  {V,  E)  where  V denotes  the  network  nodes 
and  E  =  {{u,  v)},  where  u,vEV.  The  data-forwarding  proeess  from  a  node  to  a  neighboring  node 
is  called  a  hop.  A  data  delivery  path  consisting  of  one  or  more  hops  forms  a  route.  We  also  use  the 
well-known  5 -layer  model  [64]  for  the  software  and  hardware  parts  of  a  network.  Data  generated 
in  the  top  (application)  layer  at  host  A  are  transferred  to  lower  layers,  and  the  bottom  (physical) 
layer  transmits  the  data  to  the  corresponding  physical  layer  at  host  B.  Then  the  data  are  transferred 
up  to  B’s  application  layer  via  the  data- link,  network  and  transport  layers.  The  middle  (network) 
layer  is  responsible  for  maintaining  appropriate  routes  from  sources  to  destinations.  The  network 
topology  ean  change  over  time  due  to  node  movements,  so  usually  there  is  no  entity  that  has  global 
knowledge  of  the  network’s  exact  connection  status.  Hence,  the  delivery  route  from  source  to 
destination  can  frequently  change,  and  the  nodes  need  a  method  —  a  routing  protocol  such  as 
dynamic  source  routing  (DSR)  [61]  — to  discover  the  routes  to  use. 
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The  physical  layer  of  a  MANET  handles  conversion  between  data  bits  and  radio  signals.  The 
received  signal  strength  (RSS)  of  the  radio  signal  at  the  receiver  RX,  can  be  expressed  by: 

RSS(RA)  =  Pow{TX)*Gain{TX,  RX)  =  Pow(TX)  •[  ki*Dist{TX,  RA)“  +  k2] 

where  Pow{TX)  denotes  the  transmission  power  of  node  TX,  Dist{TX,RX)  the  distance  between  TX 
and  RX,  and  a  represents  the  radio  attenuation  of  the  environment.  For  successful  data  reception, 
RSS  should  exceed  some  threshold  characterized  by  the  receiver’s  sensitivity,  so  that  receivers 
within  the  transmission  range  rjx  of  the  transmitter  can  successfully  receive  the  message.  In  our 
work,  we  assume  that  a  MANET  node  can  receive  data  from  at  most  one  transmitter  at  a  time.  The 
main  criteria  for  successful  communication  are:  (1)  there  should  be  just  one  transmitting  node 
within  the  transmission  range  of  the  receiver,  and  (2)  the  signal-to-interference-noise  ratio  (SINR) 
should  be  above  some  threshold. 


Topology  control  algorithms 

Methodology 

LINT  and  LILT  [76] 

K-neigh  protocol  [11] 

Bounded  number  of  neighbors 

Cone-based  topology  control  [54,96] 

At  least  one  neighbor  in  every  cone 

MST-based  topology  control  [55,56] 

Local  minimum  spanning  tree 

Figure  27.  Representative  topology  control  algorithms  and  their  methodology. 

MANET  connections  change  over  time  due  to  node  movement,  and  the  resulting  network  topology 
changes  critically  affect  network  operation.  Understanding  the  impact  of  mobility  and  topology  on 
the  performance  is  essential  for  designing  network  control  algorithms.  Analysis  of  the  impact  of 
mobility  on  network  performance  usually  relies  on  simulations  using  artificial  random  mobility 
models  [45,  46,  91].  Unlike  wired  networks,  MANET  topology  can  be  actively  controlled  by 
adjusting  the  transmission  power  of  the  nodes.  The  range  of  the  nodes  should  be  assigned  so  that 
power  consumption  and  signal  interference  are  minimized,  while  connectivity  is  maintained. 

There  have  been  many  studies  of  how  transmission  range  affects  the  network  performance  and 
connectivity;  see  Figure  27.  For  instance,  local  information  no  topology  (EINT)  [79]  simply 
allows  each  node  to  maintain  a  bounded  number  of  neighbors  by  incrementally  adjusting  its 
transmission  power.  However,  although  it  forms  a  network  that  is  connected  with  a  high 
probability,  EINT  does  not  guarantee  global  connectivity.  To  deal  with  this  problem,  the  related 
local  information  link-state  topology  (EIET)  method  uses  routing  tables  to  maintain  global 
connectivity.  Most  control  algorithms  attempt  to  fix  the  network  connection  after  a  topology 
change  occurs,  and  cannot  reduce  the  number  of  connection  changes  caused  by  node  movements. 
To  remove  this  limitation,  we  have  developed  a  proactive  topology  control  algorithm  that  adjusts 
the  transmission  power  of  communicating  nodes,  prevents  frequent  link  breaks,  and  improves 
communication  performance. 
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4.3.  Impact  of  Mobility  on  Performance 

MANET  performance  is  highly  sensitive  to  changes  in  node-to-node  connections  (communication 
links)  caused  by  node  movement.  Link  instability  of  this  kind  has  proven  very  difficult  to  analyze 
mathematically,  so  previous  work  has  relied  heavily  on  simulation.  We  have  constructed  a 
mathematically  tractable  model  of  node  motion,  the  constant  velocity  (CV)  model,  and  used  it  to 
derive  a  precise  relation  between  mobility  and  connection  stability  [52,  53].  Our  analysis  allows 
determination  of  the  appropriate  frame  length  for  efficient  single-hop  communication.  We 
investigated  connection  stability  in  multi-hop  communication,  and  uncovered  some  underlying 
properties  of  previously  proposed  mobility  metrics.  In  particular,  we  demonstrated  that  link 
duration  has  a  nearly  invariant  relationship  with  the  stability  of  multi-hop  connections  for  a  wide 
range  of  mobility  models,  and  thus  forms  an  excellent  mobility  metric. 

We  approach  the  problem  as  follows.  First,  we  set  up  the  relatively  simple  CV  model,  and  derive 
two  mobility  metrics  for  it,  link  duration  (LD)  and  link  change  rate  (LCR)  [65,  75,  82].  Then  we 
derive  an  analytic  expression  for  successful  packet  delivery  via  single-hop  communication.  We 
further  investigate  the  relation  between  LD  and  multi-hop  route  stability.  We  quantify  connection 
stability  by  the  mean  residual  duration  (RD)  of  routes,  which  measures  how  long  multi-hop  routes 
last  under  the  given  mobility  conditions.  We  show  that,  among  previously  proposed  mobility 
metrics  [42],  LCR  is  unsuitable  for  estimating  link  stability  because  the  relation  between  LCR  and 
RD  depends  on  other  network  parameters.  In  contrast,  by  using  the  analytic  expressions  derived 
from  our  CV  model,  we  find  that  RD  is  a  function  of  LD;  i.e.,  the  multi-hop  connection  stability  is 
mainly  determined  by  single-hop  link  duration.  We  also  derive  simulation  results  which  show  that 
LD  has  a  consistent  relation  with  RD  for  a  wide  range  of  mobility  conditions.  Our  analysis  and 
simulations  confirm,  as  suggested  in  [49],  that  LD  constitutes  a  very  good,  unified  metric  for  link 
stability  with  many  types  of  mobility  models. 

4.3.1.  Constant  Velocity  Model 

Assume  that  nodes  are  randomly  placed  on  an  unbounded  plane  with  a  density  p.  The  nodes  move 
linearly  at  a  constant  velocity  v  in  random  directions,  but  do  not  change  direction  while  moving; 
see  Figure  28.  We  have  shown  that  the  average  LCR  ^lcr  is  given  by: 


k-LCR  2A.gen  Tiprv 

where  kgen  is  the  average  link  generation  rate.  For  example,  if  nodes  move  at  the  average  speed  v 
=  10  m/s  with  transmission  range  r  =10  m,  and  the  node  density  is  p  =  0.02  m~^,  then  ^lcr  =  2kgen 
=  10.2  s“\ 
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Figure  28.  Illustration  of  node  movements  in  a  MANET. 


The  link  duration  7ld  is  the  time  from  link  generation  to  link  break,  and  is  a  measure  of  the 
stability  of  single-hop  connections  [42],  We  can  describe  the  event  that  node  N,  passes  through  the 
transmission  region  of  a  source  No  with  two  parameters  (X,0).  The  link  duration  is  then  given  by 
Tld  =  Y(X)/vip(0)  where  Y(X)  =  2(r^  -  and  the  mean  value  of  Tfo  is: 

'k  k  2 
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It  should  be  noted  that  LD  is  not  the  reciprocal  of  LCR  or  the  link  generation  rate.  LCR  is  the 
reciprocal  of  the  time  between  two  successive  link  changes,  whereas  LD  is  defined  as  the  time 
between  link  generation  and  link  break. 

Suppose  a  receiver  N/  enters  the  communication  region  of  a  transmitter  No  at  time  t  =  0  and  exits 
the  region  at  time  t  =  Tld.  For  a  packet  with  transmission  time  Tcomm  to  complete  communication 
before  N/  moves  out  of  range,  the  transmission  should  start  at  time  t,  where  0  <  t  <  Tij){x,0)  -  Tcomm- 
Hence,  the  conditional  probability  of  complete  transmission  is: 

P  comp^^ comm\^t&)  ynQxYTu){X,0^  Tcomm  ^  0]  /TM0) 

and  the  total  probability  of  complete  transmission  is: 
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where  gx,dx,0)  denotes  the  joint  probability  density  function  of  random  variables  X  and  0.  Wq 
can  rewrite  the  last  expression  as: 


T  1 

PcompiT)=  I  I 
0  0 

where  r  denotes  a  normalized  communication  time  that  can  be  interpreted  as  the  ratio  of  node 
mobility  to  communication  speed. 


Figure  29.  Plot  of  transmission  probability  Pcomp  vs.  communication  time  t. 

Figure  29  compares  the  foregoing  analysis  with  simulated  results  using  100  mobile  nodes  on  a  1  x  1 
plane.  The  network  parameters  such  as  the  transmission  range  r,  the  node  speed  v,  and  the 
communication  time  Tcomm  are  varied  so  that  x  ranges  from  0  to  2.2.  Comparison  of  the 
simulated  and  calculated  pcomp  values  supports  the  accuracy  of  the  CV  model. 

Although  the  success  rate  of  single -hop  communication  is  generally  insensitive  to  mobility, 
successful  data  delivery  over  multi-hop  routes  critically  depends  on  the  connection  stability.  We 
define  the  residual  duration  of  a  multi-hop  route  as  the  mean  time  from  the  route  discovery  to  the 
breaking  of  the  route.  RD  is  the  key  factor  which  determines  the  success  of  packet  delivery  over 
multi-hop  routes,  and  so  is  an  important  parameter  in  MANET  design  [41].  For  instance,  when  the 
RD  of  a  multi-hop  route  is  100  ms,  data  delivery  which  takes  500  ms  over  the  route  will  probably 
fail  due  to  the  connectivity  change.  Hence  RD  is  an  indicator  of  the  stability  of  multi-hop  routes, 
whereas  link  duration  indicates  the  stability  of  single-hop  links. 
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4.3.2.  Mobility  Metric  Relationships 


We  now  investigate  the  relationship  between  LD  and  RD.  It  is  extremely  diffieult  to  derive 
rigorous  expressions  for  RD  using  existing  mobility  models  such  as  random  waypoint,  random 
walk,  and  boundless  simulation  area.  Hence,  we  conducted  a  series  of  simulations  with  these 
mobility  models,  which  show  a  strong  correlation  between  LD  and  RD.  Later  we  analyze  the 
relationships  between  LD  and  RD  by  using  our  CV  model.  The  simulation  is  organized  as  follows. 
First,  100  nodes  are  randomly  placed  on  a  1x1  plane.  The  nodes  start  moving  according  to  the 
given  mobility  model.  At  time  t  =  0,  the  simulator  computes  the  shortest,  multi-hop  path  from  each 
node  to  the  root  node  Nq.  When  a  multi-hop  route  is  broken  at  time  t,  all  routes  with  later  hops  that 
pass  along  the  broken  route  are  also  regarded  as  broken  at  time  t.  This  procedure  measures  the  RD 
of  routes  from  the  shortest  path  discovery  to  their  break  times. 


(a)  High-niobiliW  case 


(h)  Low-mobilit\'  case 


Figure  30.  Simulation  results  with  RWP  model. 

Here  we  only  present  results  for  the  random  waypoint  (RWP);  results  for  the  other  model  can  be 
found  in  [52].  Mobile  nodes  in  the  RWP  model  [61]  behave  as  follows.  First  a  node  selects  a 
random  destination  within  a  bounded  movement  area,  and  moves  toward  it  at  a  random  speed  v. 
Once  it  arrives  at  the  destination,  it  pauses  for  a  predefined  pausetime.  The  speed  v  is  a  random 
variable  uniformly  distributed  between  speedmin  and  speedmax.  By  varying  r,  speedmin, 
speedmax,  and  pausetime,  we  can  control  the  node  movements. 

Figure  30(a)  shows  simulation  results  corresponding  to  a  high-mobility  case  with  parameters:  r  = 
0.138,  speedmin  =  0.5,  speedmax  =  1 .0,  and  pausetime  =  0.1,  for  which  the  link  duration  Tld  =  0.26. 
The  RD/LD  ratio  of  0.32  for  2-hop  links,  for  example,  indicates  that  the  average  residual  duration 
of  a  2-hop  link  is  0.32  x  Tld  =  0.0832.  Figure  30(b)  corresponds  to  a  low-mobility  case:  r  =  0.25, 
speedmin  =  0.\,  speedmax  =  1.5,  and  pausetime  =  5.0,  for  which  Tld  =  '2..1>1  .  In  this  case,  due  to  the 
relatively  long  pausetime,  a  considerable  number  of  nodes  are  observed  in  their  pause  state.  Hence, 
this  case  has  a  larger  LD  value  than  the  previous  case.  Although  the  two  cases  differ  in  their 
mobility  parameters  and  LDs,  their  RD/LD  ratios  are  nearly  identical.  Instead  of  changing 
pausetime  alone,  we  also  varied  speedmin  and  speedmax  and  obtained  nearly  the  same  distribution 
ofRDs. 
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From  our  simulation  studies,  we  saw  that  LD  has  a  eonsistent  relation  with  RD  for  a  wide  range  of 
mobility  models  and  parameter  values.  Using  our  CV  model,  we  ean  explain  the  strong  eorrelation 
between  LD  and  RD,  and  also  the  relations  between  LD,  LCR  and  RD.  Consider  the  question: 
What  makes  LD  a  better  mobility  metrie  than  LCR?  Suppose  at  time  t  =  0,  a  mobile  node  No 
observes  a  neighbor  node  Ni  within  its  transmission  range,  and  at  time  t  =  To,  the  neighbor  leaves 
the  range.  This  exit  time  To  is  a  random  variable,  and  its  eumulative  distribution  funetion  (CDF)  is 
F{t).  Henee,  the  probability  that  a  link  remains  eonneeted  until  time  t  is  given  by: 

Pcomp{t.v/r)  =  1  -  F{i) 

Next,  suppose  that  a  k-hop  route  eonsists  of  A:  +  1  mobile  nodes  {N,}  and  eaeh  node  pair  (N^-i,  Ny) 
is  eonneeted  by  a  link  Ly.  At  time  t  =  0,  the  route  is  eonneeted,  and  at  t  =  Fi,  the  route  breaks.  Let 
Gk{t)  and  gk{t)  denote  the  CDF  and  probability  density  funetion  (PDF),  respeetively,  of  the  random 
variable  T\.  Then  the  probability  that  the  route  remains  eonneeted  until  time  t  is  1  -  Gkit)),  and  the 
RD  for  this  route  is: 


T  RDX  =  I  t-gk{t)Jt 
.'0 

Assuming  the  generation/break  proeesses  of  the  links  in  a  route  are  mutually  independent  [59],  the 
probability  that  a  k-hop  route  remains  eonneeted  until  time  t  beeomes  1  -  Gk{t)  =  [1  ~  F{i)f  from 
which  we  can  deduce  that: 


Hence,  we  see  that  RD  is  a  function  of  LD  rather  than  LCR,  which  is  why  LD  is  a  good  indicator  of 
multi-hop  connection  stability. 

It  is  now  easily  seen  that  the  metrics  LD  and  LCR  are  related  by  the  formula  A  lcr  *Tld  =  2p7ir^, 
which  implies  that  the  product  of  half  of  LCR  and  LD  equals  the  average  node  degree.  This 
relation  also  follows  from  Little’s  theorem,  which  states  that  the  average  number  of  customers  in  a 
system,  p7ir^,  is  equal  to  the  product  of  the  customer  arrival  rate,  Xgen  =  XlcrIX  and  the  average  of  the 
time  Tld  customers  spend  in  the  system. 

A  recent  study  by  Nayebi  et  al.  cites  our  work  and  observes  that  the  probability  distribution  of  LD 
of  the  RWP  model  is  similar  to  that  of  the  CV  model  [73].  Furthermore,  using  the  boundless 
random  direction  model  (BRDM)  derived  from  CV,  they  showed  that  the  PDF  of  RWP  can  be 
approximated  fairly  accurately  by  adding  stationary  nodes.  From  these  observations,  it  can  be 
seen  that  LD  with  RWP  is  practically  equivalent  to  that  with  CV.  Therefore,  we  conclude  that  LD 
is  a  good  unified  mobility  metric  for  most  types  of  mobile  ad  hoc  networks,  and  that  CV  is  a  useful 
model  for  mobility  analysis. 
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4.4.  Distributed  Power-Aware  Link  Maintenance 


We  have  developed  a  new  management  algorithm  for  MANETs  ealled  PALM  (power-aware  link 
maintenance),  which  simultaneously  performs  transmission  power  control  and  route  connectivity 
maintenance.  Unlike  most  topology  control  algorithms,  PALM  manages  the  transmission  power 
of  active  nodes  only,  and  thus  eliminates  considerable  energy  and  channel  resource  waste.  We 
also  investigated  how  medium  access  control  (MAC)  parameters  affect  MANET  performance,  and 
found  ways  to  maximize  network  throughput  while  maintaining  a  high  energy  efficiency. 

4.4.1.  Introduction 

Topology  controls  [48,  67,  79,  87]  have  long  been  considered  for  MANET  power  management. 
They  attempt  to  reduce  the  transmission  power  while  maintaining  a  connected  network  topology. 
However,  they  have  several  drawbacks.  First,  they  require  periodic  beaconing,  which  wastes 
energy  and  channel  resources.  Second,  the  actual  connected  routes  still  have  to  be  discovered  by 
the  routing  layer,  so  that  even  when  the  network  is  connected,  frequent  rediscovery  of  connected 
routes  may  occur.  Third,  in  order  to  make  the  network  insensitive  to  node  mobility,  topology 
control  algorithms  must  allow  longer  transmission  range,  which  may  worsen  the  network 
performance. 

The  so-called  BASIC  power  control  [62]  algorithm  has  been  proposed  to  mitigate  the  above 
problems.  The  transmitter  computes  the  minimum  transmission  power  for  data  delivery  based  on 
received  signal  strength  at  the  receiver.  Then  the  data  packets  are  transmitted  at  minimum  power, 
while  only  control  packets  such  as  request-to-send  and  clear-to-send  are  transmitted  at  full  power. 
While  BASIC  reduces  overall  energy  consumption,  it  still  has  shortcomings.  The  nodes  must 
abide  by  the  data  delivery  routes  discovered  by  the  routing  layer,  which  may  not  be  power-efficient, 
and  the  control  packets  transmitted  at  full  power  may  collide  with  other  nodes’  transmissions.  The 
power  control  MAC  (PCM)  protocol  [62]  mitigates  the  latter  problem  by  allowing  periodic 
full-power  transmission  for  DATA  packets,  but  the  former  problem  still  persists. 

The  power  aware  routing  optimization  (PARC)  algorithm  [57,  58]  attempts  to  discover 
power-efficient  routes  in  a  distributed  manner  without  periodic  beaconing.  PARC  assumes  that  all 
nodes  are  within  transmission  range  of  each  other.  At  the  start,  a  source  node  A  directly  sends 
packets  to  a  destination  node  E.  If  another  node  C  overhears  the  communication  from  A  to  E,  and 
determines  that  route  redirection  via  itself  conserves  energy,  then  the  route  becomes  ACE.  This 
redirection  may  be  repeated  many  times.  Sometimes  the  redirected  routes  become  inefficient  as  a 
shorter  route  may  exist.  PARC  attempts  to  remove  unnecessary  redirectors  to  discover 
power-efficient  routes  in  a  distributed  manner.  However,  PARC  still  carries  the  risk  of  generating 
inefficient  routes,  since  it  cannot  easily  determine  hop  distance. 
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Our  proposed  PALM  scheme  resolves  the  above  problems  with  BASIC  and  PARO.  We  first 
determine  a  virtual  hop  distance  to  efficiently  assign  and  measure  hop  distances  in  the  presence  of 
route  redirections.  Second,  we  add  an  accumulated  energy  field  to  packets,  and  enable  the  removal 
of  multiple  redirectors.  Third,  we  use  different  transmission  power  levels  for  control  and 
broadcast  packets  to  mitigate  the  problems  with  BASIC’s  power  control. 

4.4.2.  PALM  Method 


The  medium  access  control  in  PALM  is  based  on  IEEE  802.11.  We  assume  ad  hoc  on-demand 
(AODV)  [77]  as  the  routing  algorithm.  We  also  assume  that  transceivers  are  equipped  to  detect 
received  signal  strength  RSS,  which  was  defined  earlier  and  is  proportional  to  the  transmission 
power,  palm’s  power  control  method  is  similar  to  BASIC’s  except  for  the  power  levels  of 
RTS/CTS  packets.  The  transmitter  TX  records  its  transmission  power  level  Pow{TX)  on  outgoing 
packets,  and  receivers  RX/,  including  overhearing  nodes,  estimate  the  channel  gain  Gain{TX,  RX^ 
from  the  RSS.  TX  sends  data  packets  at  the  power  level  given  by: 


Po\MTX.RXi)=/i- 


RSS 


min 


GaimTX.RXi) 


where  RSSmin  and  P  denote  the  minimum  RSS  for  successful  reception  and  a  safety  factor, 
respectively.  PALM  uses  the  minimum  power  for  data  packets,  and  relatively  greater  power  for 
control  packets. 


Overtiearer 


Source 


Direct 

Delivery 


Destination 


Figure  31.  Communication  between  nodes  in  PALM,  including  route  redirectiou. 
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PALM  performs  route  redireetion  in  a  similar  way  to  PARO:  see  Figure  31.  Here  node  i  is 
forwarding  data  paekets  to  next-hop  node  j,  and  node  k  is  overhearing  them.  Solid  lines  in  Figure 
3 1  indicate  intended  packet  flow,  and  dotted  lines,  overheard  flow.  The  DATA  packets  contain  the 
source  address  i,  the  destination  address  j,  and  i’s  transmission  power  Pow{i).  Node  j  replies  with 
an  acknowledge  (ACK)  message  containing  i,  j,  Pow{j)  and  Gain{i,  j).  The  overhearing  node  k 
now  can  determine  which  of  the  direct  i  j  and  redirected  k^j  routes  consumes  less  energy, 
and  it  can  estimate  the  energy  savings  AE.  After  overhearing  ACK,  node  k  sends  a 
request-to-redirect  (RTR)  packet  to  node  i.  On  receiving  an  RTR  message,  i  modifies  its  routing 
table  so  that  its  new  next  hop  node  becomes  k  instead  of  j.  Subsequent  packets  are  delivered  along 
k^j  instead  of  i  j.  Note  that  there  might  be  more  than  one  overhearing  node  that  sends  an 
RTR  to  the  source.  To  address  this,  when  a  node  that  is  to  send  an  RTR  message  overhears  another 
RTR  with  a  larger  AE  value,  it  does  not  send  an  RTR  message,  and  discards  its  own  RTR. 

PAEM’s  redirection  technique  enables  nodes  to  repair  routing  tables  locally  without  propagating 
rerouting  requests.  However,  this  carries  the  risk  of  generating  loops  in  the  delivery  path.  PAEM 
resolves  loop  problems  in  the  following  way.  Eirst,  we  define  the  virtual  hop  distance  to  the  final 
destination  as  a  rational  number  instead  of  an  integer.  After  a  route  is  discovered,  the 
real-numbered  virtual  hop  distance  has  the  same  value  as  the  usual  integer-numbered  hop  distance. 
When  redirecting  a  link  between  nodes  X  and  Y  with  hop  distances  hx  and  hy,  respectively,  the 
redirector  computes  its  (virtual)  hop  distance  as  {hx+  hy)!!.  By  taking  this  value  between  /z^and 
hy,  PALM  maintains  monotonically  decreasing  hop  distance  numbers  along  routes  without 
propagating  the  redirection  event  to  other  nodes.  PALM  has  a  locking  method  for  use  when 
packets  are  forwarded  along  the  links  or  when  a  link  entry  is  added.  Since  its  route  redirection 
scheme  tends  to  increase  the  number  of  hops  from  source  to  destination,  PALM  also  employs 
mechanisms  to  mitigate  some  associated  problems.  Eor  example,  it  includes  a  “route-warping” 
technique  to  prevent  an  excessive  number  of  hops  from  being  created  by  the  redirection  procedure. 
Increasing  the  number  of  hops  tends  to  increase  channel  contention  between  nodes.  In  particular, 
the  long  transmission  range  for  RTS/CTS  packets  may  worsen  this  problem.  PALM  addresses  this 
by  reducing  transmission  power  for  broadcast  and  control  packets.  All  these  techniques 
collectively  enable  PALM  to  continuously  construct  power-efficient  and  loop-free  routes. 

4.4.3.  Performance  Evaluation 

We  conducted  an  extensive  set  of  experiments  to  evaluate  PALM.  We  used  the  ns  simulator  [85], 
and  modified  its  AODV,  MAC  802.11,  and  physical  layer  to  implement  PALM.  A  MANET 
model  for  the  experiments  was  constructed  as  follows.  The  radio  channel  parameters  are  taken 
from  [62]:  the  RSS  threshold  RSSth  =  0.3652  nW;  the  maximum  transmission  power  PoWmax  = 
28 1 .8  mW  corresponding  to  250  m  transmission  range;  and  the  channel  bandwidth  is  2  Mbps.  We 
assume  that  each  node  i  can  take  any  value  between  0  and  PoWmax  as  its  transmission  power.  The 
following  PALM  specific  parameters  are  also  used:  energy  saving  threshold  a  =  0.8;  safety  factor  P 
=  1.5;  active  neighbor  window  Tact  =  0.5  s;  usage  expiration  time  Tunused  =  0.5  s;  and  unlock  time 
Tuniock  =  0.5  s.  Einally,  the  two-ray  ground  model  [62]  was  adopted  as  the  radio  propagation  model. 
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(a)  Energy  consumption  per  transmitted  bit 
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Figure  32.  Simulation  rsults  with  varying  node  speed  and  a  CBR  of  80kbps. 


First,  we  carried  out  simulations  with  varying  node  speeds.  The  network  consists  of  35  mobile 
nodes  randomly  distributed  on  a  700x700  m2  plane.  Source  and  destination  nodes  are  randomly 
chosen  and  a  constant-bit-rate  (CBR)  source  sends  packets  at  80  kbps.  Packet  size  is  1000  bytes, 
and  packet  interval  time  is  0.1  s.  The  RWP  was  used  with  the  following  parameters;  pause  time  = 
0.0;  minimum  speed  =  maximum  speed  =  v  varying  from  5  m/s  to  20  m/s.  Each  simulation  was  run 
for  500  s.  The  maximum  number  of  active  neighbors  is  set  to  A:  =  8.  Figure  32  presents  the 
simulation  results  with  respect  to  node  speed  ranging  from  5  m/s  to  20  m/s.  For  comparison, 
simulation  results  with  AODV  [77],  dynamic  source  routing  (DSR)  [61],  and  destination  sequence 
distance  vectoring  (DSDV)  [76]  with  the  BASIC  power  control  scheme  are  included;  also  included 
is  a  variant  of  PARC.  It  can  be  seen  that  PAFM’s  energy  consumption  is  much  lower  than  that  of 
other  algorithms  including  PARC,  and  its  packet  loss  rate  is  comparable  to  others. 
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Figure  33.  Simulation  results  with  varying  traffic  rate  and  a  node  speed  of  20  m/s. 

Next,  we  conducted  simulations  that  varied  the  traffic  rate  from  32  to  80  kbps;  see  Figure  33.  The 
packet  size  is  1,000  bytes,  and  the  packet  interval  ranges  from  0.1  s  to  0.25  s.  The  node  speed  is  set 
to  20  m/s.  The  energy  consumption  of  DSDV  decreases  as  the  traffic  rate  grows  because  its 
beaconing  energy  is  amortized  by  the  increased  data  traffic.  However,  DSDV  still  results  in  high 
packet  loss  rate  for  the  reason  stated  earlier.  On  the  other  hand,  PAFM  produces  low  packet  loss 
rate,  while  it  still  consumes  much  less  energy  than  other  algorithms. 
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We  also  measured  the  impaet  of  the  maximum  number  k  of  aetive  neighbors  on  the  energy 
eonsumption  and  the  paeket  loss  rate,  and  eompared  it  with  PARO’s  energy  eonsumption  and  loss 
rate.  The  simulation  was  done  over  a  stationary  network  to  isolate  the  effeet  of  transmission  power. 
palm’s  operation  with  a  stationary  network  is  identieal  with  that  of  PARO  exeept  for  the 
transmission  power  for  eontrol  and  broadeast  paekets.  We  found  that  PALM’s  energy 
eonsumption  inereases  with  k,  but  is  still  mueh  lower  than  PAROs,  even  for  fairly  large  k.  On  the 
other  hand,  PALM’s  paeket  loss  rate  is  mueh  lower  than  that  of  PARO.  It  ean  be  eoneluded  that 
transmitting  eontrol  paekets  to  only  a  few  of  the  aetive  neighbors  is  suffieient  to  prevent  signal 
eollision,  and  eonsiderably  improves  eommunieation  eoneurreney. 

4.4.4.  Impact  of  MAC  Parameters 

We  investigated  how  medium  aeeess  eontrol  parameters  affeet  the  performanee  of  MANETs  in 
whieh  the  transmission  power  for  data  paekets  is  eontrolled  to  the  minimum  value.  We  eonsidered 
the  impaet  of  three  faetors  in  the  MAC  layer  on  network  performanee:  RTS/CTS  handshake, 
transmission  power  eontrol,  and  earrier-sense  threshold.  The  RTS/CTS  handshake  is  used  to 
reduee  signal  eollisions  due  to  simultaneous  transmissions  by  nodes  loeated  outside  eaeh  other’s 
transmission  range,  whieh  is  ealled  the  hidden  terminal  problem  [47].  It  has  been  reeently 
observed  that  while  the  RTS/CTS  handshake  effeetively  reduees  the  hidden  terminal  problem  in 
loeal  area  networks,  its  effeetiveness  is  limited  in  ad  hoe  networks  [88,  89,  90]. 

For  the  purpose  of  redueing  oommunieation  power  eonsumption,  the  BASIC  power  eontrol 
method  has  long  been  eonsidered,  whieh  adjusts  the  power  level  for  DATA  paekets  to  the 
minimum  neeessary  value,  while  the  RTS/CTS  paekets  are  transmitted  at  the  maximum  level. 
However,  reeent  studies  [63,  72,  81,  93]  have  shown  that  when  the  RTS/CTS  handshake  is  used 
with  BASIC,  the  overall  network  throughput  may  beeome  worse  than  the  networks  without 
transmission  power  eontrol,  beeause  the  RTS/CTS  paekets  ean  easily  interfere  with  other  DATA 
paekets  with  the  redueed  power  level.  In  addition,  the  earrier-sense  (CS)  threshold  affeets 
eoneurreney  of  network  eommunieation,  and  in  eonsequenee,  the  end-to-end  network  throughput 
also  depends  on  the  earrier-sense  threshold  [90]. 

We  have  eonfirmed  through  simulations  that  the  optimal  CS  threshold  is  given  by  CSth  =  RSSo/6zo, 
where  zo  is  the  minimum  signal-to-interferenee  ratio  (SIR)  for  a  sueeessful  signal  eapture.  Sinee 
this  eondition  does  not  depend  on  the  eommunieation  distanee  or  the  radio  attenuation  exponent  a, 
onee  CSth  is  set  to  the  optimal  value,  even  when  the  network  nodes  eontinuously  adjust  the 
transmission  power,  and  the  radio  ehannel  property  ehanges,  the  optimality  eondition  ean  always 
be  satisfied. 
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We  have  shown  that  use  of  the  RTS/CTS  handshake  with  power  eontrol  protocols  such  as  PALM 
may  adversely  affect  the  network  performance.  Our  simulation  results  with  variants  of  the  BASIC 
power  control  scheme  confirm  that  the  use  of  maximum  power  for  control  packets  can  reduce  the 
network  throughput  by  70%,  and  the  total  energy  consumption  with  retransmission  taken  into 
account  is  much  higher  than  that  with  carrier  sense  multiple  access  (CSMA)  only.  Therefore, 
when  the  transmission  power  for  data  packets  in  MANETs  is  controlled  to  the  minimum  necessary 
level  through  the  channel  gain  feedback  between  nodes,  the  RTS/CTS  packets  with  the  maximum 
transmission  power  may  worsen  signal  collision  and  increase  energy  consumption.  We  also  have 
shown  that  a  high  network  capacity  can  be  obtained  by  reducing  the  transmission  power  for  control 
packets,  and  proposed  a  means  to  determine  the  appropriate  transmission  power  for  control 
packets  in  a  distributed  manner. 

4.5.  Node  Placement  Optimization 

Energy  conservation  is  a  key  issue  in  ad  hoc  wireless  network  operation.  Placing  additional  nodes 
at  appropriate  locations  can  substantially  lower  the  power  requirements  of  communicating  nodes. 
This  section  investigates  node  placement  for  energy-constrained  networks,  and  presents  node 
placement  algorithms  that  aim  to  minimize  total  energy  consumption.  We  first  consider  the  mobile 
base  station  (BS)  placement  problems  for  hierarchical  wireless  networks,  and  develop  efficient 
heuristic  solutions  for  them.  We  model  the  placement  of  multiple  BSs  as  a  clustering  optimization 
problem  in  which  BSs  and  user  nodes  are  treated  as  clusterheads  and  cluster  members,  respectively. 
We  also  devise  a  heuristic  that  discovers  the  central  area  of  a  multi-hop  network,  and  solves  the  BS 
placement  problem  with  multi-hop  connectivity.  Our  simulation  results  confirm  that  our  methods 
reduce  the  energy  consumption  of  wireless  networks  by  up  to  55%  compared  with  grid  networks. 
By  using  the  PALM  algorithm  presented  above,  we  devise  a  distributed  relay  placement  algorithm 
for  the  flat  network  structure  that  discovers  energy-efficient  routes  while  maintaining  connectivity 
in  the  presence  of  radio  obstruction.  Simulation  results  show  that  the  power  consumption  of  the 
distributed  implementation  is  greater  than  that  of  an  existing  centralized  algorithm  by  only  25%. 
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Figure  34.  (a)  Hierarchical  network  with  mohile  BSs,  and  (h)  Flat  network  with  mohile  relays. 
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4.5.1.  Introduction 


Ad  hoc  networks  can  be  categorized  to  two  classes,  hierarchical  and  flat;  see  Figure  34.  In 
hierarchical  networks  as  in  Figure  34(a),  nodes  are  grouped  to  clusters,  and  a  clusterhead  or  BS 
node  in  each  cluster  takes  responsibility  for  intercluster  communication.  In  flat  networks  as  in 
Figure  34(b),  all  nodes  have  the  same  communication  ability,  and  data  from  the  source  user  node 
are  delivered  to  the  destination  via  intermediate  user  nodes  or  relays. 

We  first  consider  a  two-layered  hierarchical  network  structure  consisting  of  user  nodes  and  mobile 
BSs,  which  can  be  moved  to  any  location.  We  further  assume  that  the  BSs  have  a  separate  wireless 
channel  for  communication  between  them,  and  they  have  sufficiently  large  energy  sources. 
Consider  the  following  example.  Suppose  there  are  N  sensor  nodes  with  wireless  transceivers  in  a 
forest.  We  dispatch  R  UAVs  with  wireless  communication  capability  to  assist  network 
communication  between  the  sensors.  Where  should  we  place  the  UAVs  to  minimize  energy 
consumption  of  the  sensor  devices?  We  model  this  problem  as  a  clustering  problem  composed  of 
unconstrained  convex  optimization  subproblems.  We  treat  BSs  as  clusterheads,  and  aim  to 
minimize  the  energy  consumption  of  uplink  communication,  i.e.,  communication  from  user  nodes 
to  BSs.  As  the  clustering  problem  is  NP-hard,  we  develop  an  efficient  heuristic  method  based  on 
the  K-means  algorithm  [70,  80],  which  is  widely  used  for  determining  clusters  that  minimize  the 
mean  squared  distance  from  points  to  the  nearest  cluster  means.  Simulation  results  show  that  our 
BS  placement  algorithm  produces  near-optimal  solutions  with  high  probability.  Then,  we 
investigate  the  relay  placement  problem  for  the  flat  network  structure  by  modeling  it  as  an 
analogous  mechanical  system.  Our  goal  in  relay  placement  is  to  place  mobile  relays  at  appropriate 
locations,  and  minimize  the  total  power  consumption,  while  maintaining  network  connectivity. 
By  emulating  the  artificial  forces  exerted  on  mobile  relays,  and  utilizing  the  PALM  algorithm,  we 
solve  the  relay  placement  problem  in  the  presence  of  radio  obstruction  in  a  distributed  manner. 
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4.5.2.  Base  Station  Placement  Optimization 

We  investigate  base  station  placement  (BSP)  optimization  for  hierarchical  networks  considering 
four  different  network  structures  of  increasing  complexity,  which  are  summarized  in  Figure  35. 
For  the  node  placement  problem  of  hierarchical  networks,  we  again  adopt  the  radio  model  used 
earlier.  We  assume  that  nodes  expend  the  minimum  transmission  power  by  using  the  BASIC 
power  control  scheme.  We  further  assume  that  BSs  have  full  information  about  the  locations  of  all 
nodes.  We  also  allow  source  and  destination  nodes  to  be  randomly  chosen.  Unlike  sensor 
networks  in  which  most  nodes  send  data  to  a  single  sink,  data  can  be  generated  at  any  node,  and 
can  be  delivered  to  any  other  node,  and  the  source-destination  pair  can  frequently  change  over 
time. 

We  use  the  following  network  architecture,  which  resembles  that  of  the  near-term  digital  radio 
(NTDR)  network  [78,  92].  A  source  node  x  either  directly  transmits  data  to  the  nearest  base  station 
BS(x),  or  sends  it  over  the  shortest  multi-hop  path  to  BS(x).  Then  BS(x)  sends  the  data  to  the  base 
station  BS(y)  of  the  destination  y  through  a  separate  communication  channel.  Eventually,  BS(y) 
sends  the  data  to  the  final  destination  y.  In  this  structure,  the  inter-BS  network  provides  the 
network  backbone,  and  the  clusterhead  serves  as  the  gateway. 


Figure  36.  Network  structure  with  siugle-hop  commuuicatious:  (a)  User  uodes  (white)  directly  commuuicate 
with  a  siugle  BS  (hlack),  aud  (h)  User  uodes  commuuicate  with  their  uearest  BSs. 


First  we  consider  the  single  base  station  with  single -hop  links  or  1-SH  case.  User  nodes  directly 
communicate  via  a  single  BS  as  in  Figure  36(a).  Suppose  that  N  nodes  are  placed  at  locations 
denoted  by  vectors  pj.  The  nodes  directly  communicate  with  a  BS  located  at  Xj,  and  we  want  to 
minimize  the  uplink  communication  energy  consumed  by  the  user  nodes  by  controlling  the 
location  the  BS.  The  BSP  problem  is  then;  What  is  the  optimal  locationlrj?  The  power  consumed 
by  each  node  pi  can  be  written  as: 


EiPi)  =  T{pi){k\\\x-  pi|[’  +k2) 


where  x(pi)  denotes  the  transmission  time  of  pi,  which  is  the  number  of  bits  to  be  sent  by  pi  divided 
by  the  bandwidth  of  the  transceiver,  and  ||  •  ||  denotes  the  Euclidian  norm.  Thus  the  1-SH  BSP 
problem  becomes  the  following  optimization  problem; 
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Minimize 


N 

^  r(/7,)(Aill.v-/?,ir  +  A:) 

/=i 


where  BS  loeation  Xj  is  the  optimization  variable.  This  is  an  uneonstrained  optimization  problem 
whieh  turns  out  to  be  convex  and  can  be  efficiently  solved  with  existing  convex  optimization 
techniques. 

To  extend  the  preceding  case  to  multiple-BS  placement  (the  if-SH  problem),  assume  that  there  are 
if  BSs  and  N nodes  as  in  Figure  36(b).  Each  node  communicates  with  its  nearest  BS,  and  there  is  a 
separate  channel  for  inter-BS  communication.  The  energy  cost  function  of  the  1-SH  case  does  not 
change  its  form  with  respect  to  the  BS  location.  However,  such  an  invariant  function  cannot  be 
used  for  the  if-SH  case,  as  user  nodes  change  their  BSs  depending  on  the  BS  locations,  which 
makes  this  problem  NP-hard  [74].  Define  clusters  {Ck}  as  K  disjoint  subsets  of  E  ={  pi}.  The 
if-SH  BSP  problem  then  becomes  the  following  clustering  optimization  problem: 

K 

Minimize  ^  ^  p,  ){k\ \ |.v*  -  /)(ir  +  h ) 

t=l  f€C* 

where  clusters  Ck  and  BS  locations  X/^  are  the  optimization  variables.  This  is  a  generalization  of 
the  well-known  7f-means  clustering  problem.  To  solve  it,  we  use  the  following  straightforward 
heuristic.  First  groups  nodes  into  their  nearest  clusters.  Then  compute  the  BS  locations  by  solving 
the  1-SH  optimization  problem: 

Minimize  ^  r(/7,)(Ai |[v*  -  /),][’  +  A2) 

These  steps  are  repeated  until  no  change  in  clustering  occurs. 

Convergence  of  the  K-'SH  algorithm  can  be  proved  easily.  Optimality  depends  on  the  initial  BS 
locations  X/^,  just  as  in  the  K-means  case  [60].  For  this  reason,  in  order  to  obtain  satisfactory 
solutions,  K-'SH  needs  to  be  repeated,  for  which  we  have  also  devised  an  efficient  heuristic 
method.  Our  solution  methods  for  the  1-MH  and  7f-MH  cases  further  extend  these  ideas,  and  can 
be  found  in  [55]. 
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4.5.3.  Simulation  Results 


The  solution  quality  of  the  proposed  BSP  algorithms  depends  on  the  initial  clustering  seeds. 
Figure  37  shows  simple  examples  of  optimal  and  suboptimal  BSP  placements  produced  by  our 
7f-SH  algorithm.  Figure  38  gives  histograms  of  solutions  produced  by  (a)  random  seeding,  (b)  the 
seeding  method  of  Ostrovsky  et  al.  [74],  (c)  the  modified  method  of  [74],  and  (d)  the  farthest-first 
method;  the  resolution  of  the  data  is  15  mW.  Each  seeding  method  was  repeated  1000  times,  and 
the  solution  quality  was  measured  by  the  total  transmission  power.  According  to  Figure  38(b),  the 
probability  that  the  method  of  [74]  produces  an  optimal  solution  is  approximately  1%,  which  is 
only  slightly  higher  than  that  of  the  random  seeding  (0.4%).  Figure  38(d)  shows  that  the 
farthest-first  method  produces  the  optimal  solution  with  high  probability  (38%).  It  performs 
clustering  in  a  greedy  manner,  and  consequently,  outperforms  the  other  seeding  methods  when 
nodes  are  almost  uniformly  distributed.  Thus,  the  farthest-first  method  was  adopted  for  the 
remaining  experiments. 
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Figure  37.  Examples  of:  (a)  Optimal,  and  (b)  Sub-optimal  BS  placement.  User  nodes  (white)  form  a  grid 
structure  and  BS  locations  (black)  are  computed  by  A-SH. 
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Figure  38.  Total  power  consumption  after:  (a)  Random  seeding,  (b)  Tbe  seeding  method  of  [34]  (original),  (c) 
The  seeding  method  of  [34]  (modified,  and  (d)  The  farthest  first  method. 
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4.5.4.  Distributed  Relay  Placement  Optimization 

While  the  BS  plaeement  algorithms  presented  above  eompute  optimal  loeations  that  aet  as 
elusterheads  with  relatively  large  energy  sourees,  the  algorithms  eonsidered  next  try  to  plaee 
mobile  relays  that  forward  data  from  previous  hop  nodes  to  next  hop  nodes.  We  also  investigate 
the  distributed  relay  placement  (DRP)  problem  in  the  presence  of  radio  obstacles. 

In  this  analysis,  we  assume  that  only  one  user  node  is  the  sink  node  that  serves  as  the  final 
destination  of  all  data  generated  by  other  users.  Sources  send  new  data  to  a  sink  through  direct  or 
multi-hop  routes.  Relays  only  forward  data  from  sources  to  the  sink.  The  network  attempts  to 
construct  energy-efficient  routes  using  PALM.  We  assume  that  nodes  adjust  their  transmission 
power  to  the  minimum  necessary  level,  and  receiver  sensitivity  RSSmm  is  set  to  1 .  The  power  cost 
for  communication  across  distance  d  is  Cost{d)  =  ki*ct  +  ki.  We  also  assume  that  relays  can  sense 
the  direction  of  incoming  signals  and  the  distance  to  the  transmitter,  and  so  can  deduce  the  relative 
locations  of  their  neighbors.  For  the  current  network  topology,  relays  identify  their  immediate 
neighbors,  and  move  toward  locally  optimal  locations  according  to  an  artificial  “force”  function. 
When  an  object  is  located  between  two  nodes,  direct  communication  may  be  obstmcted.  We 
assume  that  when  no  line-of-sight  communication  is  available,  the  radio  gain  between  two  nodes 
becomes  zero.  In  addition,  mobile  relays  have  full  information  about  the  locations  and  the  shapes 
of  radio  obstacles. 

In  order  to  formulate  and  solve  the  relay  placement  problem,  we  modeled  the  network  as  a 
mechanical  system  with  springs  and  a  viscous  damper — a  widely  used  approach  for  solving 
optimization  problems  [69].  Specifically,  we  model  the  communication  energy  cost  as  an  artificial 
potential  energy  stored  in  springs,  and  nodes  as  objects  with  unit  mass,  moving  according  to  the 
artificial  force  field.  Movement  of  objects  then  resembles  the  progressive  solution  improvement  of 
the  steepest  descent  method  [50]  in  convex  optimization.  Thus,  through  this  model,  we  can  obtain 
a  locally  optimal  solution  as  the  mechanical  system  converges  to  an  equilibrium  point. 

Based  on  this  mechanical  analog,  we  designed  a  distributed  controller  of  mobile  relays  as  follows. 
The  sink  node  determines  the  area  where  an  additional  relay  is  needed,  and  dispatches  a  mobile 
relay  to  that  area.  On  arrival,  the  new  relay  starts  to  discover  energy-efficient  routes,  and  moves  in 
the  direction  specified  by  the  mechanical  model  so  that  the  total  energy  consumption  decreases. 
Then  the  sink  dispatches  another  relay.  These  steps  continue  until  the  desired  number  of  relays  are 
placed  in  the  network. 

In  the  DRP  algorithm,  each  relay  performs  the  following  operations  repeatedly:  route  redirection, 
location  sensing,  energy  cost  estimation,  and  movement  control.  First,  it  continuously  attempts  to 
discover  energy-efficient  routes  by  performing  route  redirection  as  described  for  PALM.  Second, 
both  sources  and  relays  continuously  broadcast  hello  messages  to  their  immediate  neighbors.  By 
sensing  the  incoming  hello  messages,  each  relay  can  estimate  the  energy  cost  to  reach  its  neighbors. 
Third,  the  relay  estimates  the  energy  cost  around  itself,  and  forwards  the  estimated  value  to  the 
next-hop  node.  Each  source  node  records  its  data  generation  rate  b(s)  on  the  data  packets  it 
transmits.  By  inspecting  the  source  address  and  b{s)  recorded  on  the  received  data  packets,  relay  i 
can  determine  the  data  rate  w(i,  j)  across  the  link  {i,  j),  where  j  is  a  neighbor  node  of  i.  Thus,  the 
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relay  can  estimate  the  total  energy  cost  Ep^totaiii)  around  itself.  Source  nodes  perform  a  similar 
computation,  and  record  their  addresses  and  the  energy  cost  on  outgoing  packets.  Relay  i  inspects 
the  Eppotaii  ’)  value  recorded  on  the  packet,  and  if  Eppotaii)  >  Epjotai{i  ’)  then  it  records  its  own 
address  and  energy  cost  on  the  packet,  and  forwards  it.  In  consequence,  the  sink  node  will  receive 
the  address  of  the  node  that  has  the  greatest  energy  cost.  Fourth,  each  mobile  relay  controls  its 
movement  according  to  the  mechanical  analog;  see  the  references  for  details. 

The  final  form  of  the  propulsion  force  on  the  relay,  including  reaction  forces  when  radio  obstacles 
exist,  is: 


Pi  =  ^  “  g  •  Pl.j  +  ^  fi.J-  Ey,  •  Xi 
jiiNil)  i£Obaii) 

where  Obst{i)  denotes  the  set  of  nodes  adjacent  to  i,  and  the  distance  between  a  radio  obstacle  and 
the  communication  link  with  i  is  less  than  dcruicai- 


4.5.5.  Simulation  Results 

First,  we  first  consider  the  network  space  without  radio  obstacles,  and  compare  the  energy 
efficiency  of  the  network  structure  produced  by  the  DRP  algorithm  with  that  of  Li  and  Cassandras’ 
algorithm  [69].  Unlike  our  algorithm,  their  relay  placement  algorithm  is  a  centralized  scheme  with 
the  assumptions  similar  to  our  mechanical  model.  Its  operation  is  as  follows.  First,  a  bottleneck 
node  k  is  chosen,  and  their  algorithm  investigates  all  possible  ways  of  linking  nodes  around  the 
bottleneck  node,  the  number  of  which  is  S*!™  -  2,  where  m  is  the  number  of  k’s  neighbors.  Next,  it 
selects  the  best  connectivity  with  the  minimum  power  cost,  and  adds  a  new  relay  according  to  the 
selected  connectivity.  Then,  for  the  given  network  connectivity,  optimal  node  locations  are 
determined  through  the  so-called  inner-force  method.  The  algorithm  continues  to  add  relays  until 
the  intended  number  of  relay  nodes  are  inserted.  For  a  fair  comparison,  we  adopt  the  radio 
parameters  from  [69].  Figure  39  shows  simulation  results  for  the  DRP  algorithm  without  radio 
obstacles. 
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(a)  n  =  0;  no  relay 


(b)  n  =  1 


(d)  «  =  4 


(e)  n  =  8 


(f)  n=  12 


Figure  39.  Simulation  results  for  placing  n  relays  with  no  radio  obstacles. 


Figure  40.  Simulation  results  for  power  consumption  of  relay  placement  algorithms:  (a)  Total  power 
consumption,  and  (b)  Power  overhead  due  to  the  distributed  implementation. 


Figure  40  compares  the  power  consumption  of  our  DRP  algorithm  and  that  of  Li  and  Cassandras. 
As  Figure  40(a)  shows,  the  power  consumption  of  each  algorithm  decreases  as  the  number  of 
added  relays  n  increases,  and  the  gap  between  these  algorithms  also  decreases.  Let  P distributed  and 
Pcentraiized  dcnotc  the  total  powcr  consumption  of  the  algorithms.  Figure  40(b)  shows  the  relative 
overhead  of  the  power  consumption  measured  as  {Pdistributed~  Pcentraiized)! Pcentraiized-  When  n  is  small, 
the  overhead  is  large,  but  as  n  grows,  it  decreases  and  becomes  as  small  as  25%.  The  DRP 
algorithm  constructs  a  power-efficient  network  structure  in  a  distributed  manner,  and  its  power 
efficiency  is  comparable  to  that  of  Li  and  Cassandras’  centralized  algorithm. 
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Figure  41.  Relay  placement  with  obstacles;  n  denotes  the  number  of  added  relays. 


Figure  41  provides  simulation  results  for  relay  plaeement  in  the  presenee  of  radio  obstaeles.  Nine 
souree  nodes  (blaek)  are  placed  in  a  grid  structure  on  a  plane.  The  sink  node  is  at  the  bottom  left, 
and  the  other  eight  source  nodes  are  data  sources.  There  are  eight  radio  obstacles  (grey  disks),  and 
n  mobile  relays  (white)  are  inserted.  It  can  be  seen  that  the  relay  arrangement  maintains 
connectivity  in  the  presence  of  the  radio  obstacles. 

Figure  42(a)  shows  simulation  results  for  total  power  consumption  obtained  by  the  DRP  algorithm 
in  the  presence  of  radio  obstructions  arranged  as  in  Figure  41;  results  without  radio  obstructions 
are  also  shown.  As  expected,  the  total  power  consumption  monotonically  decreases  as  the  number 
of  relays  increases.  Figure  42(b)  shows  the  ratio  between  the  power  consumption  values  with  and 
without  obstructions.  It  can  be  seen  that  if  we  add  about  six  relays,  the  power  overhead  due  to 
obstructions  stays  almost  constant  with  respect  to  the  case  without  obstructions.  Thus,  we 
conclude  that  the  proposed  DRP  technique  effectively  reduces  the  total  power  consumption,  even 
in  the  presence  of  obstructions,  while  maintaining  the  network’s  connectivity. 
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Figure  42.  Simulation  results  for  the  proposed  DRP  algorithm:  (a)  Total  power  consumption,  and  (h)  Ratio  of 

power  consumption  levels  with  and  without  ohstacles. 
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5.  Simulation  and  Verification  of  Quantum  Circuits 
5.1.  Introduction 

The  recent  interest  in  quantum  circuits  is  motivated  by  several  complementary  considerations. 
Quantum  information-processing  is  rapidly  becoming  a  reality  as  it  allows  manipulating  matter  at 
unprecedented  scale.  Such  manipulations  may  create  particular  entangled  states  or  implement 
specific  quantum  evolutions  —  they  find  uses  in  atomic  clocks,  ultra-precise  metrology, 
high-resolution  lithography,  optical  communication,  etc.  In  August  2009,  EE  Times  reported  that 
'^researchers  at  the  National  Institute  of  Standards  and  Technology  ...demonstrated  continuous 
quantum  operations  using  a  trapped-ion  processor"  that  stored  quantum  states  in  beryllium  ions 
for  up  to  15  seconds.  A  large-scale  architecture  for  quantum  computing  proposed  in  June  2009  by 
physicists  from  Michigan  and  Maryland  suspends  linear  ion  crystals  in  an  anharmonic  trap.  Their 
design  provisions  for  100  ytterbium-based  logical  qubits  and  20  additional  ions  for  laser  cooling. 
Commercial  quantum  communications  and  cryptography  have  so  far  relied  on  quantum-optical 
implementations,  where  qubits  are  stored  in  photons  and  can  be  transported  over  great  distances. 

Engineers  traditionally  simulate  new  designs  before  implementing  them.  Such  simulation  may 
identify  subtle  design  flaws  and  save  both  costs  and  effort,  and  is  therefore  closely  related  to 
verification.  Both  simulation  and  verification  techniques  typically  use  well-understood  host 
hardware,  e.g.,  one  can  simulate  a  quantum  circuit  on  a  commonly-used  conventional  computer. 
However,  working  with  even  the  smallest  quantum  circuits  (2-3  qubits)  requires  powerful  software 
tools.  A  quantum  circuit  can  simultaneously  process  a  superposition  (linear  combination)  of  many 
n-bit  input  combinations,  and  its  functionality  is  specified  by  a  complex-valued  (2”  x  2”)-matrix. 
Because  of  the  apparent  data  explosion,  simulation  and  verification  of  quantum  circuits  are  harder 
than  in  the  case  of  conventional  circuits.  However,  transformations  of  quantum  circuits,  e.g., 
adapting  circuits  to  spin-chain  architectures,  may  introduce  errors  that  need  to  be  verified. 

Circuit  equivalence  checking  is  a  particularly  popular  circuit  verification  technique.  It  is  usually 
considered  in  conjunction  with  circuit  optimization,  where  an  initial  circuit  is  believed  to  be 
correct,  but  the  optimized  circuit  requires  verification.  One  then  needs  to  check  if  the  two  circuits 
produce  equivalent  outputs  on  all  inputs.  Equivalence  of  conventional  circuits  can  be  checked  by 
random  simulation,  i.e.,  by  looking  for  input  combinations  that  would  disprove  equivalence. 
However,  non-exhaustive  simulation  may  overlook  rare  comer  cases  where  the  two  circuits  are 
different,  and  therefore  cannot  definitively  conclude  that  two  circuits  are  equivalent.  Such 
conclusions  can  be  produced  instead  by  formal  proof  techniques.  To  this  end,  our  work  is  the  first 
to  develop  equivalence-checking  techniques  for  quantum  circuits  and  implement  them  in  reusable 
software.  We  also  found  that  the  links  between  quantum  circuit  simulation  and 
equivalence-checking  are  stronger  than  those  for  conventional  circuits. 

The  constmction  of  quantum  information  processors  is  often  motivated  by  the  hope  that  quantum 
circuits  can  compete  with  conventional  computing  and  communication.  Quantum-mechanical 
effects  may  lead  to  computational  speed-ups,  more  secure  or  more  efficient  communication,  better 
keeping  of  secrets,  etc.  To  this  end,  Siemens  and  ID  Quantique  announced  in  August  2009  the 
commercial  availability  in  Europe  of  quantumly-secure  communication  implemented  in  existing 
“dark”  optical  fiber.  Current  research  seeks  new  circuits  and  algorithms  with  revolutionary 
behavior  as  in  Shor’s  work  on  number- factoring  or  provable  limits  on  possible  behaviors.  While 
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proving  abstract  limitations  on  the  success  of  unknown  algorithms  appears  more  difficult,  a 
common  line  of  reasoning  for  such  results  is  based  on  simulation.  For  example,  if  the  behavior  of 
a  quantum  circuit  can  be  faithfully  simulated  on  a  conventional  computer,  then  the  possible 
speed-up  achieved  by  the  quantum  circuit  is  limited  by  the  cost  of  simulation.  Thus,  aside  from 
sanity-checking  new  designs  for  quantum  information-processing  hardware,  more  efficient 
simulation  can  lead  to  sharper  bounds  on  all  possible  algorithms. 

Since  the  outcome  of  a  quantum  computation  is  probabilistic,  we  clarify  our  notion  of  simulation. 
By  a  randomized  simulation,  we  mean  a  classical  randomized  algorithm  whose  output  distribution 
on  an  input  is  identical  to  that  of  the  simulated  quantum  computation.  By  a  deterministic 
simulation,  we  mean  a  classical  deterministic  algorithm  which,  on  a  given  pair  of  input  x  and 
output  y  of  the  quantum  computation,  outputs  the  probability  that  y  is  observed  at  the  end  of  the 
quantum  computation  on  x.  To  simulate  a  quantum  circuit,  one  may  use  a  naive  brute-force 
calculation  of  quantum  amplitudes  that  entails  exponential  overhead.  Achieving  significantly 
smaller  overhead  in  the  generic  case  appears  hopeless  —  in  fact,  this  has  lead  Feynman  to  suggest 
that  quantum  computers  may  outperform  conventional  ones  in  some  tasks.  Therefore,  in  the 
existing  literature,  theoretical  results  for  simulating  quantum  circuits  are  mostly  available  for 
restricted  classes  of  circuits.  Our  work  offers  new  such  classes  and  new  results,  with  application  to 
some  of  the  most  important  quantum  circuits  in  existence.  In  particular,  follow-up  work  by  other 
researchers  (Aharonov  and  Short)  applied  our  techniques  to  show  that  the  QFT  can  be  simulated  in 
polynomial  time  on  classical  computers. 

5.2.  Algorithms  for  Quantum  Circuit  Simulation 

Classes  of  quantum  circuits  that  admit  efficient  simulation  are  often  distinguished  by  a  restricted 
“gate  library”,  but  do  not  impose  additional  restrictions  on  how  gates  are  interconnected  or 
sequenced.  A  case  in  point  is  the  seminal  Gottesman-Knill  Theorem  [106]  and  its  recent 
improvement  by  Aaronson  and  Gottesman  [94].  These  results  apply  only  to  circuits  with  stabilizer 
gates — Controlled-NOT,  Hadamard,  Phase,  and  single-qubit  measurements  in  the  so-called 
Clifford  group.  Another  example  is  given  by  match  gates  defined  and  studied  by  Valiant  [127], 
and  extended  by  Terhal  and  DiVincenzo  [125]. 

A  different  way  to  impose  a  restriction  on  a  class  of  quantum  circuits  is  to  limit  the  amount  of 
entanglement  in  intermediate  states.  Jozsa  and  Linden  [110],  as  well  as  Vidal  [130]  demonstrate 
efficient  classical  simulation  of  such  circuits  and  conclude  that  achieving  quantum  speed-ups 
requires  more  than  a  bounded  amount  of  entanglement.  In  this  work,  we  pursue  a  different 
approach  to  efficient  simulation  and  allow  the  use  of  arbitrary  gates.  More  specifically,  we  assume 
a  general  quantum  circuit  model  in  which  a  gate  is  a  general  quantum  operation  (so  called 
physically  realizable  operators)  on  a  constant  number  of  qubits.  This  model,  proposed  and  studied 
by  Aharonov,  Kitaev  and  Nisan  [95],  generalizes  the  standard  quantum  circuit  model,  defined  by 
Yao  [134],  where  each  gate  is  unitary  and  measurements  are  applied  at  the  end  of  the  computation. 
We  also  assume  that  (i)  the  computation  starts  with  a  fixed  unentangled  state  in  the  computational 
basis,  and  (ii)  at  the  end  each  qubit  is  either  measured  or  traced-out. 
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Our  simulation  builds  upon  the  framework  of  tensor  network  contraction.  Being  a  direct 
generalization  of  matrices,  tensors  capture  a  wide  range  of  linear  phenomena  including  vectors, 
operators,  multi-linear  forms,  etc.  They  facilitate  convenient  and  fundamental  mathematical  tools 
in  many  branches  of  physics  such  as  fluid  and  solid  mechanics,  and  general  relativity  [108].  More 
recently,  several  methods  have  been  developed  to  simulate  quantum  evolution  by  contracting 
variants  of  tensor  networks,  under  the  names  of  Matrix  Product  States,  Projected  Entangled  Pairs 
States,  etc.  [118,  128,  129,  130,  131,  135].  In  this  framework,  a  quantum  circuit  is  regarded  as  a 
network  of  tensors.  The  simulation  contracts  edges  one  by  one  and  performs  the  convolution  of 
the  corresponding  tensors,  until  there  is  only  one  vertex  left.  Having  degree  0,  this  vertex  must  be 
labeled  by  a  single  number,  which  gives  the  final  measurement  probability  sought  by  simulation. 
Unlike  other  simulation  techniques,  we  do  not  necessarily  simulate  individual  gates  in  their 
original  order  —  in  fact,  a  given  gate  may  even  be  simulated  partially  at  several  stages  of  the 
simulation.  While  tensor  network  contraction  has  been  used  in  previous  work,  little  was  known 
about  optimal  contraction  orders.  We  prove  that  the  minimal  cost  of  contraction  is  determined  by 
the  treewidth  tw(GC)  of  the  circuit  graph  GC.  Moreover,  existing  constructions  that  approximate 
optimal  tree-decompositions  (e.g.  [122])  produce  near-optimal  contraction  sequences.  Intuitively, 
the  smaller  a  graph’s  treewidth  is,  the  closer  it  is  to  a  tree,  and  a  tree  decomposition  is  a  drawing  of 
the  graph  to  make  it  look  like  a  tree  as  much  as  possible.  Our  result  allows  us  to  leverage  the 
extensive  graph-theoretical  literature  dealing  with  the  properties  and  computation  of  treewidth. 

Theorem  1.1.  Let  C  be  a  quantum  circuit  with  T  gates  and  whose  underlying  circuit  graph  is  GC. 
Then  C  can  be  simulated  deterministically  in  time  exp[0(tw(GC))]. 

Hence  given  a  function  computable  in  polynomial  time  by  a  quantum  algorithm  but  not  classically, 
any  polynomial-size  quantum  circuit  computing  the  function  must  have  super-logarithmic 
treewidth.  The  following  corollary  is  an  immediate  consequence. 

Corollary  1.2.  Any  polynomial-size  quantum  circuit  of  a  logarithmic  treewidth  can  be  simulated 
deterministically  in  polynomial  time. 

Quantum  formulas  defined  and  studied  by  Yao  [134]  are  quantum  circuits  whose  underlying 
graphs  are  trees.  Roychowdhury  and  Vatan  [124]  showed  that  quantum  formulas  can  be 
efficiently  simulated  deterministically.  Since  every  quantum  formula  has  treewidth  1,  Corollary 
gives  an  alternative  efficient  simulation.  Our  focus  on  the  topology  of  the  quantum  circuit  allows 
us  to  accommodate  arbitrary  gates,  as  long  as  their  qubit-width  (number  of  inputs)  is  limited  by  a 
constant.  In  particular.  Corollary  1 .2  implies  efficient  simulation  of  some  circuits  that  create  the 
maximum  amount  of  entanglement  in  a  partition  of  the  qubits,  e.g.,  a  layer  of  two-qubit  gates. 
Therefore,  our  results  are  not  implied  by  previously  published  techniques.  We  now  articulate 
some  implications  of  our  main  result  to  classes  of  quantum  circuits,  in  terms  of  properties  of  their 
underlying  graphs.  The  following  two  classes  of  graphs  are  well-studied,  and  their  treewidths  are 
known.  The  class  of  series  parallel  graphs  arises  in  electric  circuits,  and  such  circuits  have 
treewidth  <  2.  Planar  graphs  G  with  n  vertices  are  known  to  have  treewidth  tw(G)  =  0(a/|F(G)|) 
[97]. 
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Corollary  1,3.  Any  polynomial-size  parallel  serial  quantum  circuit  can  be  simulated 
deterministically  in  polynomial  time. 

Corollary  1,4.  A  size-T planar  quantum  circuit  can  be  simulated  deterministically  in  exp[0(A/7)] 
time. 

Another  corollary  deals  with  a  topological  restriction  representative  of  many  physical  realizations 
of  quantum  circuits.  A  circuit  is  said  to  be  ^-local-interacting  if  under  a  linear  ordering  of  its 
qubits,  each  gate  acts  only  on  qubits  that  are  at  most  q  distance  apart.  A  circuit  is  said  to  be 
local-interacting  if  it  is  ^-local  interacting  with  a  constant  q  independent  of  the  circuit  size.  Such 
local  interaction  circuits  generalize  the  restriction  of  qubit  couplings  to  nearest-neighbor  qubits 
(e.g.,  in  a  spin-chain)  commonly  appearing  in  proposals  for  building  quantum  computers,  where 
qubits  may  be  stationary  and  cannot  be  coupled  arbitrarily.  To  this  end,  we  observe  that  the 
treewidth  of  any  local-interaction  circuit  of  logarithmic  depth  is  at  most  logarithmic. 

Corollary  1.5,  Let  C  be  a  quantum  circuit  of  size  T  and  depth  D,  and  is  q-local-interacting.  Then 
C  can  be  simulated  deterministically  in  exp[0(^Z))]  time.  In  particular,  if  C  is  a 

polynomial-size  localy-linteracting  circuit  with  a  logarithmic  depth,  then  it  can  be  simulated 
deterministically  in  polynomial  time. 

An  important  limitation  of  our  techniques  is  that  a  circuit  family  with  sufficiently  fast-growing 
treewidth  may  require  super-polynomial  resources  for  simulation.  In  particular,  this  seems  to  be 
the  case  with  known  circuits  for  modular  exponentiation.  Therefore,  there  is  little  hope  to 
efficiently  simulate  number- factoring  algorithms  using  tree  decompositions.  As  an  extreme 
illustration,  we  found  a  depth-4  circuit — including  the  final  measurement  as  the  4th  layer — that 
has  large  treewidth. 

Theorem  1.7,  There  exists  a  depth-4  quantum  circuit  on  n  qubits  using  only  one-  and  two-qubit 
gates  such  that  its  treewidth  is  Q(n). 

Note  that  a  circuit  satisfying  the  assumption  in  the  above  theorem  must  have  0{n)  size.  Our 
construction  is  based  on  expander  graphs,  whose  treewidth  must  be  linear  in  the  number  of 
vertices.  This  finding  is  consistent  with  the  obstacles  to  efficient  simulation  that  are  evident  in  the 
results  of  Terhal  and  DiVincenzo  [126],  later  extended  by  Fenner  et  al.  [107].  In  contrast,  we  are 
able  to  efficiently  simulate  any  depth-3  circuit  deterministically  while  the  simulation  in  [126]  is 
probabilistic. 

Theorem  1,8,  Assuming  that  only  one-  and  two-qubit  gates  are  allowed,  any  polynomial-size 
depth-l>  quantum  circuit  can  be  simulated  deterministically  in  polynomial  time. 

Additional  details  are  available  in  our  extended  paper  published  in  the  SIAM  Journal  on 
Computing  38(3),  pp.  963-981,  2008. 
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5.3.  Equivalence-Checking  for  Quantum  Circuits 

Equivalence  checking  is  a  basic  task  in  the  synthesis  and  verification  of  classical  digital  circuits.  A 
hardware  designer  needs  to  know  whether  a  circuit’s  implementation  is  functionally  equivalent  to 
its  specification.  In  addition,  the  equivalence  of  different  versions  of  the  same  (sub-)circuit  must 
be  checked  throughout  the  complex  computer-aided  design  process,  which  includes  circuit 
optimization,  technology  mapping  to  specific  device  architectures  and  adaptation  to  compact  and 
reliable  layout.  Combinational  equivalence  checking  for  conventional  circuits  is  solved  in  practice 
with  high-performance  solvers  for  Boolean  Satisfiability,  and  its  negative  version 
(non-equivalence)  is  NP-complete.  Equivalence  checking  is  likely  to  be  just  as  important  in 
quantum  CAD,  and  the  non-equivalence  of  quantum  circuits  is  QMA-complete.  However,  the 
equivalence  of  quantum  states  and  operators  can  be  subtle.  Unlike  their  classical  counterparts, 
qubits  and  quantum  gates  can  differ  by  global  and  relative  phase,  and  yet  be  equivalent  upon 
measurement. 

Building  upon  our  DARPA-sponsored  work  on  QuIDD  data  structures  for  simulating  quantum 
circuits,  we  developed  QuIDD  algorithms  to  check  quantum  states  and  operators  for  equivalence. 
Our  research  discovered  a  surprising  variety  of  algorithms  available  to  solve  this  problem.  This 
variety  is  partly  due  to  the  fact  that  quantum  circuits  require  the  classical  concept  of  equivalence  to 
be  extended  to  account  for  global  and  relative  phase.  This  broader  notion  of  equivalence  creates 
several  new  opportunities  in  quantum  circuit  design,  where  minimizing  the  number  of  gate 
operations  to  achieve  a  given  function  is  a  fundamental  goal.  Eor  example,  the  Toffoli  gate  can  be 
implemented  with  fewer  controlled-NOT  (CNOT)  and  1 -qubit  gates,  if  equivalence  is  interpreted 
as  “equivalence  up  to  relative  phase”  or,  more  briefly,  “relative -phase  equivalence.”  Normally  the 
Toffoli  gate  requires  an  equivalent  circuit  of  six  CNOT  and  eight  1 -qubit  gates  to  implement  it. 
Any  relative-phase  differences  present  in  an  equivalent  circuit  can  be  canceled  out  so  long  as  every 
pair  of  these  gates  in  the  circuit  is  strategically  placed.  Since  circuit  minimization  is  being  pursued 
for  a  number  of  key  quantum  arithmetic  circuits  with  many  Toffoli  gates,  such  as  the  modular 
exponentiation  occurring  in  Shor’s  algorithm,  this  type  of  phase  equivalence  could  reduce  the 
number  of  gates  even  further. 

In  our  work  we  distinguish  equivalence  of  quantum  states  up  to  global  phase,  from  equivalence  up 
to  local  phases.  Neither  affects  measurement  results  if  measurement  is  applied  immediately,  but 
may  significantly  alter  the  outcome  if  additional  quantum  gates  are  applied  to  given  states. 
Eurthermore,  a  pair  of  quantum  operators  or  quantum  circuits  can  be  equivalent  up  to  local  or 
global  phase  if  their  outputs  are  respectively  equivalent  for  all  inputs. 

5.3.1.  Global-Phase  Equivalence 

We  describe  some  algorithms  that  check  equivalence  up  to  global  phase  of  two  quantum  states  or 
operators.  The  first  two  are  well-known  linear-algebraic  operations,  while  the  remaining 
algorithms  exploit  QuIDD  properties  explicitly. 
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Inner  Product.  Since  the  quantum-eireuit  formalism  models  an  arbitrary  quantum  state  |\|/>  as  a 
unit  veetor,  the  inner  produet  <v|/|v|/>  =1-  In  the  ease  of  a  global-phase  differenee  between  two 
states  |\|/>  and  |(p>,  the  inner  produet  is  the  global-phase  faetor,  <  (p|v|/>  =  =  e^.  Sinee  \e^\ 

=  1  for  any  6,  eheeking  if  the  eomplex  modulus  of  the  inner  produet  is  1  suffiees  to  eheck 
global-phase  equivalence  for  states.  Although  the  inner  produet  may  be  eomputed  using  explieit 
arrays,  a  straightforward  QuIDD-based  implementation  is  easily  derived.  The  eomplex-eonjugate 
transpose  and  matrix  product  with  QuIDD  operands  were  defined  in  our  previous  DARP A- funded 
work  and  implemented  in  software.  Thus,  the  algorithm  computes  the  eomplex-eonjugate 
transpose  of  A  and  multiplies  the  result  with  I.  The  eomplexity  of  this  algorithm  is  stated  in  the 
following  lemma. 

Lemma  5.2.I.A.  Consider  two  state  QuIDDs  A  and  B  with  sizes  1^4 1  and  \B\,  respeetively,  in 
number  of  nodes.  The  global-phase  differenee  ean  be  eomputed  in  0(]A  |  \B\)  time  and  memory. 

Matrix  Product.  The  matrix  produet  of  two  operators  ean  be  used  for  global-phase  equivalenee 
eheeking.  In  partieular,  sinee  all  quantum  operators  are  unitary,  the  adjoint  of  eaeh  operator  is  its 
inverse.  Thus,  if  two  operators  U  and  V  differ  by  a  global  phase,  then  UV'\=e^I.  With  QuIDD 
representations  of  U  and  V,  eomputing  Vf  requires  0(|V|)  time  and  memory.  The  matrix  produet 
W  =  UVt  requires  0((|U||V|)^)  time  and  memory.  To  eheek  if  W=e'^/,  any  terminal  value  t  is 
ehosen  from  W,  and  sealar  division  is  performed  on  W  as  W'  =  Apply(W,  t,  /),  whieh  takes 
0((|U||V|)^)  time  and  memory.  Sinee  QuIDDs  are  canonical,  checking  if  W'  =  I  requires  only  0(1) 
time  and  memory.  If  W'  =  I,  then  t  is  the  global-phase  faetor. 

Node-Count  Check.  The  previous  algorithms  merely  translate  linear-algebraie  operations  to 
QuIDDs,  but  exploiting  the  following  QuIDD  property  leads  to  faster  checks. 

Lemma  5,2. l.B.  The  QuIDD  A'  =  Apply(A,  c,  *),  where  c  is  a  non-zero  eomplex  value,  is 
isomorphie  to  A,  hence  |Aj  =  |A|. 

This  lemma  states  that  two  QuIDD  states  or  operators  that  differ  by  a  nonzero  sealar,  sueh  as  a 
global-phase  faetor,  have  the  same  number  of  nodes.  Thus,  the  equality  of  node  counts  in  QuIDDs 
is  a  necessary  but  not  suffieient  eondition  for  global-phase  equivalenee.  To  see  why  it  is  not 
suffieient,  eonsider  two  state  veetors  |\|/>  and  |(p>  with  elements  wj  and  vt,  respeetively,  where  j,  k  = 
0,  1,  .  .  .A-  1.  If  some  wj  =  Vyt  =  0  sueh  that  j  k,  then  |(p>  |\|/>.  The  QuIDD  representations 

of  these  states  ean,  in  general,  have  the  same  node  eounts.  Despite  this  drawbaek,  the  node-eount 
cheek  requires  only  0{\)  time  sinee  Apply  is  easily  augmented  to  recursively  sum  the  number  of 
nodes  as  a  QuIDD  is  ereated. 

Recursive  Check.  Lemma  5. 2. l.B  implies  that  a  QuIDD-based  algorithm,  whieh  takes  into 
aeeount  terminal  value  differenees,  eheeks  a  suffieient  eondition  for  global-phase  equivalenee. 
Pseudoeode  for  sueh  an  algorithm  called  GPRC  is  shown  in  Figure  43. 


60 


GPRC  returns  true  if  two  QuIDDs  A  and  B  differ  by  global  phase  and  false  otherwise,  gp  and 
have  gp  are  global  variables  containing  the  global-phase  factor  and  a  flag  signifying  whether  or 
not  a  terminal  node  has  been  reached,  respectively.  The  value  of  gp  is  only  valid  if  true  is  returned. 
The  first  conditional  block  of  GPRC  deals  with  terminal  values.  The  potential  global-phase  factor 
ngp  is  computed  after  handling  division  by  0.  If  \ngp\  ^  1  or  if  ngp  4-  gp  when  gp  has  been  set,  then 
the  two  QuIDDs  do  not  differ  by  a  global  phase.  Next,  the  condition  specified  by  Lemma  5.2.1.B 
is  checked.  If  the  node  of  A  depends  on  a  different  row  or  column  variable  than  the  node  of  B,  then 
A  and  B  are  not  isomorphic  and  thus  cannot  differ  by  global  phase.  Finally,  GPRC  is  called 
recursively,  and  the  results  of  these  calls  are  combined  via  the  logical  AND  operation. 


GPRC(A,  B,  gp.  have_gp)  { 

if  (Is_Constant(A)  and  Is_Constant(B))  { 
if  {Value(B)  ==  0)  return  (Value(A)  ==  0) 
ngp  =  Value(A)/Value(B) 
if  (sqrt(real(ngp)  *  real(ngp)-|- 
imag(ngp)  *  imag(ngp))  !  =  1) 
return  false 
if  (!havG_gp)  { 
gp  =  ngp 
havG_gp  =  true; 

} 

return  (ngp  ==  gp) 

} 

if  {(Is_Constant(A)  and  !ls_Constant(B)) 
or  (!ls_Constant(A)  and  Is_Constant(B))) 
return  false; 

if  (Var(A)!  =  Var(B))  return  false 
return  (GPRC{Tlien(A), ThenlB), gp, have_gp) 
and  GPRC(Else(A).  Else(B),  gp,  liave_gp)) 

} 


Figure  43.  Pseudo-code  for  the  recursive  global-phase  equivaleuce  check. 

Early  termination  occurs  when  isomorphism  is  violated  or  more  than  one  phase  difference  is 
computed.  In  the  worst  case,  both  QuIDDs  will  be  isomorphic,  but  the  last  terminal  visited  in  each 
QuIDD  will  differ  by  more  than  a  global-phase  factor,  causing  full  traversals  of  both  QuIDDs. 
Thus,  the  overall  runtime  and  memory  complexity  of  GPRC  for  states  or  operators  is  0(|A|-I-|B|). 
Also,  the  node-count  check  can  be  run  before  GPRC  to  quickly  eliminate  many  non-equivalences. 

Empirical  Results.  The  first  benchmark  considered  is  a  single  iteration  of  Grover’s  quantum 
search  algorithm,  where  the  oracle  searches  for  the  last  item  in  the  database.  One  iteration  is 
sufficient  to  test  the  effectiveness  of  the  algorithms,  since  the  state  vector  QuIDD  remains 
isomorphic  across  all  iterations,  as  was  proven  in  our  past  DARPA-sponsored  work. 

Figure  44  shows  the  runtime  results  for  the  inner  product  and  GPRC  algorithms.  Results  for  the 
node-count  check  algorithm  are  not  shown  since  it  runs  in  0(1)  time. 
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Figure  44.  (a)  Runtime  results  and  regressions  for  the  inner  product  and  GPRC  on  checking  glohal-phase 
equivalence  in  a  Grover  iteration,  and  (h)  QuIDD  size  of  the  state  vector. 


These  results  eonfirm  the  asymptotie  eomplexity  differenee  between  the  inner-produet  and  GPRC 
algorithms.  Because  the  number  of  nodes  in  the  QuIDD  state -vector  after  a  Grover  iteration  is 
0(n),  the  runtime  complexity  of  the  inner  product  should  be  0(n^),  which  is  confirmed  by  a 
regression  plot  within  only  a  1%  error.  In  contrast,  the  runtime  complexity  of  the  GPRC  algorithm 
should  be  0(n),  which  is  also  confirmed  by  a  regression  plot,  within  the  same  error  margin. 

We  next  compared  runtimes  of  the  matrix-product  and  GPRC  algorithms  checking  the  Grover 
operator.  Our  previous  work  shows  that  the  QuIDD  representation  of  this  operator  grows  in  size  as 
0{n),  which  is  confirmed  in  our  empirical  data.  Therefore,  the  runtime  of  the  matrix  product 
should  be  quadratic  in  n,  but  linear  in  n  for  GPRC.  Regression  plots  verify  these  complexities 
within  0.3%  error.  We  also  compared  states  that  appear  in  Shor’s  integer  factorization  algorithm. 
In  particular,  we  consider  states  created  by  the  modular  exponentiation  sub-circuit  that  represent 
all  possible  combinations  of  x  andy(x,  AO  ~  niod  N,  where  N  is  the  integer  to  be  factored.  Each  of 
the  0(2")  paths  to  a  non-0  terminal  encodes  a  binary  value  for  x  and  J[x,N).  This  experiment 
demonstrates  how  the  algorithms  fare  with  exponentially-growing  QuIDDs. 

In  our  experiments,  each  N  is  an  integer  whose  two  non-trivial  factors  are  prime  (such  numbers  are 
often  called  semi-prime),  a  is  set  to  N-2,  but  in  general  can  be  chosen  randomly  from  the  range 
[2..N-2].  In  our  experiments,  states  |v|;>  and  |(p>  are  equal  up  to  global  phase.  The  node  counts  for 
both  states  are  equal  as  predicted  by  Lemma  5.2.I.B.  Interestingly,  both  algorithms  exhibit  very 
similar  performance.  Further  results  were  produced  for  the  cases  in  which  Hadamard  gates  are 
applied  to  the  first,  middle,  and  last  qubits,  respectively,  of  |(p>.  These  results  show  that  early 
termination  in  GPRC  can  enhance  performance  by  a  factor  of  roughly  1 .5x  to  lOx.  In  almost  every 
case,  both  algorithms  represent  far  less  than  1%  of  the  total  runtime. 

Thus,  checking  for  global-phase  equivalence  among  QuIDD  states  appears  to  be  easy  once  the 
QuIDDs  are  created.  An  interesting  side  note  is  that  in  modular  exponentiation,  some  QuIDD 
states  with  more  qubits  have  more  exploitable  structure  than  those  with  fewer  qubits.  For  instance, 
the  N=  387929  (19  qubits)  QuIDD  has  fewer  than  half  the  nodes  of  the  A^=  163507  (18  qubits) 
QuIDD.  We  also  compared  the  matrix-product  and  GPRC  algorithms  checking  the  inverse  QFT 
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operator.  While  the  inverse  QFT  is  key  to  Shor’s  algorithm,  its  n-qubit  QuIDD  representation 
grows  as  0{2^"),  and  the  asymptotie  differenees  in  the  matrix  produet  and  GPRC  are  very 
notieeable.  Also,  the  memory  usage  indieates  that  the  matrix  produet  may  need  asymptotieally 
more  intermediate  memory  despite  operating  on  QuIDDs  with  the  same  number  of  nodes  as 
GPRC. 

5.3.2.  Relative-Phase  Equivalence 

Like  the  global-phase  ease,  the  relative-phase  equivalenee  eheeking  problem  ean  be  solved  in 
several  ways.  Our  first  three  algorithms  adapt  standard  linear  algebra  to  QuIDDs,  while  the  last 
two  exploit  QuIDD  properties  direetly,  offering  asymptotie  runtime  and  memory  improvements. 

Modulus  and  Inner  Product.  Consider  two  state  veetors  |\|/>  and  |(p>  that  are  equal  up  to  relative 
phase  and  have  complex-valued  elements  wj  and  vt,  respectively,  where  j,  k  =  Q,  \,  .  .  .  N-\. 
Computing  |(p'>  =  |vj  |  |j>  and  |\|/'>  =  Sk=o^~^  |wk|  |k>  =  Sk=o^~^  |e  *Vk|  |k>  sets  each  phase 

factor  to  I,  allowing  the  inner  product  to  be  applied.  The  complex  modulus  operations  are 
computed  as  C  =  Apply(A,  |  •  |)  and  D  =  Apply(B,  |  •  |)  with  runtime  and  memory  complexity  0(|A| 
+  |B|),  which  is  dominated  by  the  0(|A||B|)  complexity  of  the  inner  product. 

Modulus  and  Matrix  Product.  For  operator  equivalence  up  to  relative  phase,  there  are  two  cases 
depending  on  whether  the  diagonal  relative-phase  matrix  appears  on  the  left  or  right  side  of  one  of 
the  operators.  Consider  two  operators  U  and  V  with  elements  uj^k  and  vj^k,  respectively.  The  two 
cases  in  which  the  relative-phase  factors  appear  on  either  side  of  V  are  described  as  ujy  =  e 
(left  side)  and  ujy  =  (right  side).  In  either  case,  the  matrix  product  check  discussed  above 

may  be  extended  by  computing  the  complex  modulus  without  increasing  the  overall  complexity. 
Note  that  neither  this  algorithm  nor  the  modulus  and  inner-product  algorithms  calculate  the 
relative-phase  factors. 

Element-wise  Division.  Given  the  states  discussed  for  the  modulus  and  inner  product  check, 
Wk=e'*Vk,  the  operation  Wk/vj  for  each  j  =  k  produces  a  relative-phase  factor,  e^.  The  condition 
|wk/vj  I  =  I  is  used  to  check  if  each  division  yields  a  relative  phase.  If  this  condition  is  satisfied  for 
all  divisions,  the  states  are  equal  up  to  relative  phase. 

The  QuIDD  implementation  for  states  is  simply  C  =  Apply(A,B,  /),  where  Apply  is  augmented  to 
avoid  division  by  0  and  instead  return  I  when  two  terminal  values  being  compared  equal  0,  and 
return  0  otherwise.  Apply  can  be  further  augmented  to  terminate  early  when  |wj/vi|  I.  C  is  a 
QuIDD  vector  containing  the  relative-phase  factors.  If  C  contains  a  terminal  value  0,  then  A  and  B 
differ  by  more  than  a  relative  phase.  Since  one  call  to  Apply  implements  this  algorithm,  the 
runtime  and  memory  complexity  are  0(\A\\B\). 
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Element-wise  division  for  operators  is  more  complicated.  For  QuIDD  operators  U  and  V  ,  W  = 
AppIy(U,  V,  /)  corresponds  to  a  matrix  with  the  relative-phase  factor  along  row  j  in  the  case  of 
phases  appearing  on  the  left  side,  and  along  column  j  in  the  case  of  phases  appearing  on  the  right 
side.  In  the  first  case,  all  rows  of  W  are  identical,  meaning  that  the  support  of  W  does  not  contain 
any  row  variables.  Similarly,  in  the  second  case  the  support  of  W  does  not  contain  any  column 
variables.  A  complication  arises  when  0  values  appear  in  either  operator.  In  such  cases,  the 
support  of  W  may  contain  both  variable  types,  but  the  operators  may  in  fact  be  equal  up  to  relative 
phase. 

We  now  present  an  algorithm  based  on  Apply,  which  accounts  for  these  special  cases  by  using  a 
sentinel  value  of  2  to  mark  valid  0  entries  that  do  not  affect  relative-phase  equivalence.  These 
entries  are  recursively  ignored  by  skipping  either  row  or  column  variables  with  sentinel  children  (S 
specifies  row  or  column  variables),  which  effectively  fills  copies  of  neighboring  row  or  column 
phase  values  in  their  place  in  W.  The  algorithm  must  be  run  twice,  once  for  each  variable  type. 
The  size  of  W  is  0(\  U\  \  VjJ  since  it  is  created  by  a  variant  of  Apply. 

RP.DIV(A,B.S)  { 

if  (A  ==  New_Tenninal(0))  { 
if  (B  !  =  New-Tenninal(O)) 
return  New_Tenninal(0) 
return  New_Teniiinal(2) 

} 

if  (Is_Coiistant(A)  aiid  Is_Constaiit(B))  { 
nrp  =  Value(A)/Value(B) 
if  I  sqrt:(real(iirp)  *  real(iirp)-l- 
imag(xirp)  «=  imag(iirp))  !  =  1) 
return  New_Tertniiial(0) 
return  New_Terminal(nrp) 

} 

if  {TableXookup(R.RPJ)IV.A.B.S))  return  R; 

V  =  Top_Var(A.  B) 

T  =  RPJDIV(A,.Bv.S) 

E  =  RP_DrV(A,..B,..S) 
if  ((T  ==  New_Tenniiial(0))  or 
(E  ==  New_Terminal(0))) 
return  New_Tenniiial(0) 
if  ((T  !  =  E)  and  (Type(v)  ==  S))  { 
if  (Is_Constant(T)  and  Value(T)  ==  2) 
return  E 

if  (Is_Constant(E)  and  Value(E)  ==  2) 
return  T 

return  New_Teniiinal(0) 

) 

if  {Is_Constant(T)  and  Value(T)  ==  2) 

T  =  New_Terminal(  1) 
if  (Is-Constant(E)  and  Value(E)  ==  2) 

E  =  New-Terminal(  1 ) 

R  =  ITE(v.T.E) 

Table.Insert(R. RP_DIV.  A.  B.  S) 
return  R 

} 


Figure  45.  Pseudo-code  for  the  elemeut-wise  divisiou  algorithm. 
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Non-0  Terminal  Merge.  A  necessary  condition  for  relative-phase  equivalence  is  that  zero-valued 
elements  of  each  state  vector  appear  in  the  same  locations,  as  expressed  by  the  following  lemma. 

Lemma  5,2,2.  A  necessary  but  not  sufficient  condition  for  two  states  |(p>  =  £j=o^  ^  vy  [j> 
and  |\|/>  =  S  k=o^~'  Wk  |k>  to  be  relative-phase  equivalent  is  that  whenever  vj  =Wk  =  0,  j  =  k. 

QuIDD  canonicity  may  be  exploited  with  this  condition.  Let  A  and  B  be  the  QuIDD 
representations  of  states  |\|/>  and  |(p>,  respectively.  First,  compute  C  =  Apply(A,  [|  •  |])  and  D  = 
Apply(B,  f|  ■  11),  which  converts  every  non-zero  terminal  value  of  A  and  B  into  a  1.  Since  C  and  D 
have  only  two  terminal  values,  0  and  1,  checking  if  C  and  D  are  equal  satisfies  Lemma  5.2.2. 
Canonicity  ensures  this  check  requires  0(1)  time  and  memory.  The  overall  runtime  and  memory 
complexity  of  this  algorithm  is  0(\A\  +  \B\)  due  to  the  unary  Apply  operations.  This  algorithm  can 
also  be  applied  to  operators,  since  Lemma  5.2.2  also  applies  to  ujy  =  (phases  on  the  left)  and 
Uj,k  =  e''*Vj,k  (phases  on  the  right)  for  operators  U  and  V. 

Modulus  and  DD  Compare.  A  variant  of  the  modulus  and  inner-product  check,  which  also 
exploits  the  canonicity  of  QuIDDs,  provides  an  asymptotic  improvement  when  checking  a 
necessary  and  sufficient  condition  for  relative-phase  equivalence  of  states  and  operators.  As  with 
the  modulus  and  inner  product  check,  compute  C  =  Apply(A,  I’D  and  D  =  Apply(B,  |- 1).  If  A  and  B 
are  equal  up  to  relative  phase,  then  C  =  D  since  each  phase  factor  becomes  a  1 .  Canonicity  again 
ensures  that  this  check  requires  0(1)  time  and  memory.  Thus,  the  runtime  and  memory 
complexity  of  this  algorithm  is  dominated  by  the  unary  Apply  operations,  giving  0(\A\  +  \B\). 

5.3.3.  Empirical  Validation 

We  now  present  experimental  results  for  the  relative-phase  equivalence-checking  algorithms.  The 
first  benchmark  circuit  creates  a  remote  EPR  pair  between  the  first  and  last  qubits,  via 
nearest-neighbor  interactions.  Given  an  initial  state  |00  . .  .  0>,  it  creates  the  remote  EPR-pair  state 
(1/a/2)(|00  . . .  0>  +  1 10  . . .  1>).  The  circuit  size  is  varied,  and  the  final  state  is  compared  to  the  state 
^g0.345y(j2)  |00  .  .  .  0>  +  1 10  ...  1>.  Our  data  show  that  all  of  our  algorithms  run  quickly. 

Eor  example,  the  inner  product  is  the  slowest  algorithm,  yet  for  a  1000-qubit  instance,  it  runs  in  0.2 
seconds,  a  small  fraction  of  the  7.6  seconds  required  to  create  the  QuIDD  state  vectors. 
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Figure  46.  (a)  Runtime  results  and  regressions  for  various  algorithms  to  check  for  relative  phase  equivalence 
of  the  remote  EPR  pair  circuit,  and  (h)  Size  of  the  QuIDD  states. 
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Curve-fitting  of  the  runtime  and  memory  data  reveal  linear  complexity  for  all  algorithms  to  within 
1%  error.  This  is  not  unexpected  since  the  QuIDD  representations  of  the  states  grow  linearly  with 
the  number  of  qubits,  and  the  complex  modulus  reduces  the  number  of  different  terminals  prior  to 
computing  the  inner  product.  These  results  suggest  that  in  practice,  the  inner-product  and 
element-wise  division  algorithms  perform  better  than  their  worst-case  complexity.  Element-wise 
division  should  be  preferred  when  QuIDD  states  are  compact  since,  unlike  the  other  algorithms,  it 
computes  the  relative-phase  factors  (this  was  also  confirmed  by  another  series  of  experiments 
reported  in  our  ICC  AD  2007  publication  detailing  this  work). 


5.3.4.  Summary 

Although  QuIDD  properties  like  canonicity  enable  exact  equivalence  checking  in  0(1)  time,  we 
have  shown  that  such  properties  may  be  further  exploited  to  develop  efficient  algorithms  for  the 
difficult  problem  of  equivalence  checking  up  to  global  and  relative  phase.  In  particular,  the 
global-phase  recursive  check  and  element-wise  division  algorithms  efficiently  determine 
equivalence  of  states  and  operators  up  to  global  and  relative  phase,  while  also  computing  the 
phases.  In  practice,  they  outperform  QuIDD  implementations  of  the  inner  and  matrix  product, 
which  do  not  compute  relative-phase  factors.  Other  QuIDD  algorithms  presented  in  this  chapter, 
such  as  the  node-count  check,  non-0  terminal  merge,  and  modulus  and  DD  compare,  further 
exploit  decision-diagram  properties  to  provide  even  faster  checks,  but  only  satisfy  necessary 
conditions  for  equivalence.  Thus,  they  should  be  used  to  aid  the  more  robust  algorithms.  A 
summary  of  our  theoretical  results  on  equivalence-checking  is  given  in  Figure  47. 


Algorithm 

Phase 

type 

Finds 

phases? 

Nex:t!ssary  & 
sufficient? 

O(-)  time 
complexity: 
best-case 

O(-)  time 
complexity: 
worst-case 

Inner 

product 

Global 

Yes 

Yes 

\A\\B\ 

DI|B| 

Matrix 

product 

Global 

Yes 

Yes 

(D||B|)2 

(D||B|)2 

Node-count 

Global 

No 

Nex?.  only 

1 

1 

Recursive 

check 

Global 

Yes 

Yes 

1 

|.4|  +  |S| 

Modulus  and 
inner  product 

Relative 

No 

Yes 

D||B| 

l-t||5| 

Element-wise 

division 

Relative 

Yes 

Yes 

l-4||fi| 

DIIBI 

Non-0 

terminal  merge 

Relative 

No 

Nex?.  only 

|.4|  +  \B\ 

l-4|  +  |S| 

Modulus  and 
DD  compare 

Relative 

No 

Yes 

|.4|  +  \B\ 

l-4|  +  |S| 

Figure  47.  Key  properties  of  the  QuIDD-based  phase-equivalence  checking  algorithms. 
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5.3.5.  Adaptive  Equivalence-Checking 


To  develop  a  more  efficient  methodology  for  equivalence-checking  of  quantum  circuits,  we 
exploit  the  reversibility  of  these  circuits  to  define  a  new  concept,  called  a  reversible  miter  —  a 
natural  counterpart  of  miter  circuits  used  in  equivalence-checking  of  conventional  circuits.  In 
conjunction  with  existing  techniques  for  iterative  circuit  simplification,  reversible  miters  can 
drastically  reduce  the  size  of  verification  instances.  Our  method  is  adaptive  in  the  sense  that  it 
utilizes  multiple  techniques  appropriate  for  different  classes  of  quantum  circuit  modules.  In  this 
context,  we  study  reversible  circuits  which  are  a  subset  of  quantum  circuits  that  map  conventional 
0-1  bit-strings  into  other  such  bit-strings.  In  particular,  the  largest  module  in  Shor’s 
number- factoring  algorithm — modular  exponentiation — is  implemented  as  a  reversible  circuit,  but 
acting  on  entangled  quantum  states.  It  exceeds  all  other  modules  asymptotically  in  size,  and  thus 
requires  most  attention  of  CAD  tools.  To  verify  such  logic  modules,  we  adapt  conventional 
state-of-the-art  techniques  in  several  ways,  and  significantly  scale  up  quantum  equivalence 
checking.  Our  empirical  comparisons  confirm  that  properties  of  reversible  circuits  can  enable 
much  faster  SAT -based  equivalence-checking.  However,  conventional  techniques  cannot  be 
applied  to,  for  example,  the  Quantum  Fourier  Transform.  Therefore,  we  also  study 
equivalence-checking  of  circuits  with  non-conventional  gates  (properly-quantum  circuits),  and  the 
integration  of  heterogeneous  techniques. 


5.3.6.  Preliminaries 

Consider  gates  NOT,  CNOT  and  TOFFOLI  acting  on  classical  bits.  They  can  be  implemented 
using  NOT,  XOR  and  AND  gates.  In  the  quantum  case,  they  exchange  basis  states,  which  is  why 
their  matrices  contain  only  Os  and  Is.  As  these  gates  obey  the  same  algebraic  rules  in  both  cases, 
we  term  them  conventional  gates.  In  comparison,  the  matrix  of  the  Hadamard  gate  contains  I/a/2, 
and  its  functionality  cannot  be  expressed  in  Boolean  logic.  Therefore  we  call  such  gates 
properly-quantum.  Each  properly-quantum  gate  maps  at  least  one  0-1  input  combination  (basis 
state)  to  a  quantum  superposition  of  more  than  one  basis  state.  Measurement-free  quantum 
computation  is  reversible  in  nature,  thus  quantum  circuits  are  reversible  in  that  (i)  they  map  their 
input  configurations  to  output  configurations  one-to-one,  (ii)  this  property  is  also  observed  locally 
for  every  gate  and  sub-circuit. 

Popular  quantum  algorithms  contain  large,  application-specific  sections  dedicated  to  the 
computation  of  conventional  Boolean  functions.  In  some  cases,  the  input  to  a  quantum  algorithm 
is  read  by  these  conventional  sections  and  later  factored  into  quantum  states.  In  order  to  embed 
conventional  computation  into  the  quantum  domain,  it  must  be  made  reversible,  and  standard 
procedures  exist  for  such  transformations.  The  resulting  circuits  do  not  use  quantum  properties, 
except  that  they  can  be  applied  to  quantum  data  (superposition  states),  allowing  them  to  perform 
classical  operations  on  many  inputs  at  once.  Leveraging  this  quantum  parallelism  in  useful 
applications  is  difficult,  but  can  be  illustrated  by  Shor’s  polynomial-time  algorithm  for 
number- factoring  [147].  This  algorithm  is  dominated  by  a  reversible  module  that  performs 
modular  exponentiation,  applied  before  the  QFT.  We  call  such  circuits  without  properly-quantum 
gates  specifically  reversible  circuits  here. 
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5.3.7.  Reversible  Miters 


To  check  the  equivalence  of  two  conventional  combinatorial  circuits,  Ci  and  C2,  one  checks  if  the 
conventional  miter  circuit  XOR(CaC2)  implements  the  constant-0  function.  For  multi-output 
circuits,  corresponding  output  bits  are  XORed  and  the  results  are  ORed,  so  that  even  a  single 
mismatch  can  be  detected  by  existing  powerful  tools  for  Boolean  circuit  analysis.  Conventional 
miters  can  be  constructed  for  reversible  circuits  by  treating  these  circuits  as  AND-OR-NOT 
circuits,  except  that  the  miter  will  not  be  reversible.  However,  this  is  not  going  to  work  for 
properly-quantum  circuits.  To  address  these  limitations,  we  introduce  reversible  miters  which 
exploit  reversibility  and  can  handle  properly-quantum  gates.  We  also  discover  synergies  with 
simplification  of  reversible  circuits  [140],  [142],  [143].  Now  consider  quantum  (or  reversible) 
circuits  Ci  and  C2.  Recall  that  the  concatenation  Ci  *  C2  of  Ci  and  C2  is  also  a  quantum  (or 
reversible)  circuit. 

Ofcervahon  i.- Given  a  quantum  (or  reversible)  circuit  C  =  g;  ’  g2  *  ...  *  g/t  where  g/ is  a  gate, 
its  copy  where  all  gates  are  inverted  and  put  in  the  reverse  order,  i.e.,  g/t~^  •  ...  ’  g2~‘  '  gi~\ 
implements  the  inverse  transformation  to  what  C  implements.  We  therefore  denote  it  by  Cr\  Note 
that  NOT,  CNOT,  and  Toffoli  gates  are  their  own  inverses.  The  circuit  C  *  is  equivalent  to  an 
empty  circuit.  This  can  be  confirmed  by  iteratively  cancelling  out  pairs  of  mutually-inverse 
adjacent  gates.  This  observation  motivates  our  new  notion  of  reversible  miters. 

Definition  1:  Given  two  quantum  (or  reversible)  circuits  C;  and  C2,  their  reversible  miter  is 
defined  to  be  one  of  the  following  circuits:  Ci  ■  C2~\  C2~‘  '  Ci,  C2  '  Ci~‘,  Cf^  ■  C2.  In 
particular,  for  conventional  miters  one  needs  to  check  that  the  output  functions  implement  the 
constant  0  function,  whereas  for  reversible  miters  one  checks  that  each  output  bit  is  equivalent  to  a 
corresponding  input  bit.  Namely,  Ci  and  C2  are  functionally  equivalent  if  and  only  if  all  of  their 
reversible  miters  implement  the  identity  transformation.  In  particular,  if  one  miter  implements  the 
identity,  then  so  do  the  remaining  miters.  If  Ci  =  C2,  then  straightforward  circuit  simplification 
[140,  142,  143]  cancels  out  all  gates,  resulting  in  an  empty  circuit.  Some  of  the  variant  miters 
enable  more  cancellations  than  others,  e.g.,  if  Ci  and  C2  differ  only  in  their  first  segments,  C2  * 
Cr'  exhibits  many  gate  cancellations.  Reversible  miters  speed  up  equivalence-checking  by 
exploiting  similarities  in  circuits  by  two  distinct  mechanisms. 

1)  Local  Reduction  of  Reversible  Miters:  When  two  conventional  circuits  end  with  identical  gate 
sequences,  one  cannot  cancel  out  these  sequences  because  of  observability  don’t-cares  introduced 
by  them.  However,  reversible  circuits  do  not  experience  don’t-cares,  and  identical  suffixes  always 
cancel  out.  Note  that  a  reversible  miter  C/  *  C2~^  places  the  last  gate  of  Ci  next  to  the  last  gate  of 
C2.  If  these  two  gates  cancel  out,  the  second-to-last  gates  from  C;  and  C2  become  adjacent,  etc. 
Thus,  no  search  is  required  to  identify  these  gate  cancellations,  and  they  can  be  performed  one  at  a 
time.  Even  if  the  last  two  gates  are  different,  it  may  be  possible  to  cancel  out  second-to-last  gates, 
as  long  as  the  last  and  second-to-last  gates  do  not  act  on  the  same  (qu)bit  lines.  These  are  special 
cases  of  much  more  general  local  reductions  discussed  in  [140,  142,  143].  If  Ci  and  C2  are 
identical,  an  empty  circuit  will  result,  but  this  outcome  is  also  possible  when  local  reductions  can 
prove  equivalence  of  two  structurally  different  circuits.  A  systematic  procedure  for  applying 
reductions  was  introduced  in  [143].  Local  reductions  in  reversible  circuits  are  particularly  easy  to 
perform,  are  fast  and  do  not  consume  much  memory  [140,  142]. 
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In  our  experiments,  even  the  simplest  reduction  rules  can  dramatically  simplify  reversible  miters. 
More  sophisticated  reductions  from  [140,  142,  143]  provide  an  additional  boost. 

We  experimented  with  the  following  reduction  procedure.  In  a  miter  circuit,  consider  one  gate  at  a 
time,  search  for  a  matching  inverse,  and  try  to  move  them  together  to  facilitate  cancellation.  Any 
two  gates  can  be  swapped  if  they  do  not  act  on  the  same  (qu)bit  lines.  Two  adjacent  NOT,  CNOT 
or  Toffoli  gates  can  be  swapped  if  the  control  bit  of  one  gate  is  not  the  target  bit  of  the  other  gate 
(same  for  properly-quantum  controlled-f/  gates).  In  our  procedure,  for  the  purposes  of 
equivalence-checking,  we  temporarily  consider  the  miter  circuit  to  be  “circular”  by  connecting  its 
outputs  to  its  inputs.  Namely,  we  allow  moving  the  last  gate  to  the  beginning  of  the.  This 
transformation  does  not  change  the  equivalence  of  the  entire  circuit  to  the  identity.  In  other  words, 
if  gi  •  g2  ’  ...  '  gk-i  ’  gk-  I  (Identity),  then  gk  ’  gi  '  g2  ’  ...  '  gk-i  '  gk  ’  gk 
^  -  gk  'I  '  gk^  =  gk  '  gk^  =  I-  Therefore,  to  check  equivalence  between  gi  •  g2  '  ' 

gk-i  '  gk  and  /  is  the  same  as  to  check  equivalence  between  gk  ’  gi  ’  g2  '  ...  '  gk  and  I. 

2)  Reduction  of  Canonical  Forms:  Iterative  circuit  simplification  is  not  guaranteed  to  reduce  Ci  ■ 
C2~‘  to  the  empty  circuit  in  polynomial  number  of  steps  when  such  a  reduction  is  possible.  Finding 
a  short  reduction  may  be  time-consuming.  However,  when  constructing  canonical  forms 
(ROBDDs  or  QuIDDs)  of  reversible  miters,  a  different  kind  of  reduction  may  occur.  Suppose  that 
C/  and  C2  end  with  functionally-equivalent  but  structurally  distinct  suffixes  that  do  not  admit  local 
reductions  —  an  example  is  given  in  [5].  In  other  words  Cj  =Aj  ■  Bj  and  C2  =  A2  -32  where  Bi  ~ 
B2.  Then  C/  *  =  Ai  ■  Bj  ’  B~f  ’  A~f  ~  Ai  •  A2~^ .  As  we  traverse  the  miter  Ci  ’  C2~\ 

adding  one  gate  at  a  time  to  the  decision  diagram  (DD),  the  size  of  the  intermediate  DDs  depends 
only  on  the  transformation  implemented  by  the  current  circuit  prefix,  i.e.,  the  functions  of  the 
intermediate  wires.  The  intermediate  DD  for  H ;  •  Bj  '52~^  can  be  smaller  than  that  for  '5/ if 
Ai  ■  B]  ■  B2  ^  -  A] .  This  phenomenon  was  observed  in  our  experiments. 

5.3.8.  Methodology  for  Equivalence  Checking 

We  now  introduce  equivalence-checking  for  quantum  circuits  based  on  several  techniques 
appropriate  for  different  classes  of  quantum  circuits.  The  first  class  contains  reversible  circuits 
that  arise  as  key  modules  in  quantum  algorithms.  To  check  the  equivalence  of  two  reversible 
circuits,  Ci  and  C2,  one  can  pursue  two  strategies.  The  first  strategy  is  to  check  that  the 
conventional  miter  implements  the  constant  0  function.  A  conventional  miter  can  also  be  applied 
to  reversible  circuits  as  explained  below.  The  second  strategy  is  to  represent  the  transformations 
performed  by  C;  and  C2  in  a  canonical  form  which  supports  efficient  equivalence-checking.  The 
latter  strategy  may  use  binary-decision  diagrams,  such  as  reduced  ordered  binary  decision 
diagrams  (ROBDDs),  and  QuIDDs  [149]  or  quantum  multiple  valued  decision  diagrams 
(QMDDs)  [146].  The  former  can  be  implemented  with  either  decision  diagrams  or  Boolean 
Satisfiability  solvers  by  reducing  Circuit-SAT  to  CNF-SAT.  In  particular,  for  conventional  miters 
one  needs  to  check  that  the  output  functions  implement  the  constant  0  function.  In  addition  to  the 
basic  SAT  or  BDD-based  approaches,  finding  equivalent  signals  in  two  circuits  is  often  very 
helpful  [141].  Such  techniques  appear  useful  for  reversible  circuits  as  well,  as  shown  in  our 
experiments.  Relevant  computational  engines  are  discussed  next. 
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ROBDD,  Calculate  the  output  funetions  of  miter  eircuits,  using  ROBDD  as  the  primary  data 
structure.  This  teehnique  eannot  handle  properly  quantum  eireuits. 

QuIDD,  Build  funetional  representations  of  given  eireuits  Ci  and  C2,  and  eheek  if  the  results  are 
identieal.  In  partieular,  QuIDDPro  [149,  150]  builds  multi-terminal  deeision  diagrams  ealled 
QuIDDs  that  ean  eapture  properly-quantum  eireuits. 

SAT.  Given  two  reversible  eireuits,  eonstruet  a  CNF-SAT  formula  that  is  satisfied  only  by  those 
input  eombinations  for  whieh  the  two  eireuits  produee  different  outputs.  Then  use  a  eontemporary 
SAT  solver  [137]  to  eheek  satisfiability. 

SAT -based  teehniques  ean  be  dramatieally  improved  by  identifying  intermediate  equivalences  and 
then  using  them  to  break  up  a  large  SAT  instanee  for  the  miter  eireuit  into  several  smaller 
instanees.  The  state-of-the-art  implementation  found  in  the  Berkeley  ABC  system  [136]  (the 
“cec”  command)  uses  bit-parallel  functional  simulation  to  identify  potentially  equivalent  signals, 
then  proves  or  disproves  the  equivalenee  using  (ineremental)  SAT.  Established  equivalences 
simplify  future  SAT  ealls,  while  eounterexamples  found  are  used  to  refine  the  results  of  funetional 
simulation  and  often  distinguish  seemingly-equivalent  signals.  ABC  also  uses  fraiging  —  a  fast 
simplifieation  technique  based  on  hashing.  To  use  ABC  in  our  experiments,  we  eonstruet  a 
conventional  (irreversible)  eireuit  from  a  reversible  eireuit  by  replacing  each  gate  with  its 
AND/XOR/NOT  equivalent.  Common  benehmarks  for  reversible  synthesis  ean  be  verified  in 
milliseeonds  by  the  above  teehniques.  Instead,  we  foeus  on  sealable  bloeks  of  standard  quantum 
algorithms,  whose  optimization  and  equivalenoe-eheeking  are  eritieal  to  the  suceess  of  quantum 
computers  being  designed  today.  More  coneretely,  we  performed  experiments  with  n-bit 
linear-nearest-neighbor  (LNN)  CNOT  gate  eireuits,  a  reversible  ripple-earry  adder  eireuit 
proposed  in  [148],  mesh  eireuits  [138]  and  reversible  multipliers.  Given  a  (qu)bit  ordering,  a 
linear-nearest-neighbor  CNOT  gate  eireuit  is  a  eireuit  which  realizes  the  funetionality  of  a  CNOT 
gate  with  target  and  eontrol  bits  k  bits  apart,  by  using  only  LNN  gates  (gates  that  operate  only  on 
adjaeent  qubits).  Studies  of  LNN  arehiteetures  are  important  beeause  several  promising 
implementations  of  quantum  computation  require  the  LNN  arehitecture  (also  called  the  spin-chain 
arehitecture  in  the  physies  literature)  and  allow  only  adjacent  qubits  to  interact  directly.  Thus, 
standard  quantum  circuits  must  be  adapted  to  sueh  arehiteetures  and  modified  to  use  only  LNN 
gates.  Speeifie  transformations  and  LNN  eireuits  have  been  developed  [138,  145].  The  overhead 
of  the  LNN  arehiteeture  in  terms  of  the  number  of  gates  is  often  limited  by  a  small  factor  (3-5). 
Such  physical-synthesis  optimization  motivates  the  need  for  equivalenoe-eheeking  against  the 
original,  non-LNN  versions.  Using  important  oomponents  of  Shor’s  algorithm  [138,  147]  — 
adders,  meshes  and  multipliers  —  we  build  three  types  of  equivalenoe-eheeking  instanees. 

Same,  Two  equivalent  eireuits. 

Different  1.  Randomly  add  Toffoli  gates  at  the  end. 

Different  2,  Randomly  add  Toffoli  gates  at  the  beginning. 
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Our  empirical  data  for  CNOT,  adder  and  mesh  circuits  exhibits  essentially  the  same  trends. 
Hence,  we  report  results  only  for  adders  in  Figure  48.  All  runtimes  are  for  a  Linux  system  with  a 
2.40GHz  Intel  XeonTM  CPU  with  1GB  RAM.  We  implemented  n-bit  reversible  multipliers  using 
5n  bitlines,  including  2n  bits  for  two  inputs,  2n  bits  for  the  results,  and  n  ancillae.  For  example., 
the  line  n  =  6  in  the  tables  deals  with  30-bit  circuits.  All  methods  other  than  “cec”  timed  out  for  n 
=  8,  requiring  more  than  1,000s. 


Adder  verification  performed  by  several  techniques. 
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Figure  48.  Runtime  results  for  equivalence-checking  of  reversible  arithmetic  circuits. 

B.  Checking  Properly-Quantum  Circuits 

In  this  section,  we  show  that  our  proposed  techniques  can  handle  properly-quantum  gates,  but 
remain  compatible  with  fast  special-case  methods. 

1)  Utility  of  Reversible  Miters:  Earlier  sections  focused  on  equivalence-checking  of  reversible 
circuits  which  appear  in  modules  of  quantum  algorithms  and  require  physical  synthesis 
optimizations  [138]  that  must  be  verified.  However,  other  important  modules  in  quantum 
algorithms,  such  as  the  Quantum  Fourier  Transform,  are  properly-quantum,  and  conventional 
circuits,  such  as  modular  exponentiation,  can  be  optimized  for  performance  using 
properly-quantum  gates.  Fortunately,  simple  cancellations  in  reversible  miters  can  be  used  with 
properly-quantum  circuits.  Reduced  properly-quantum  miters  can  be  verified  using  symbolic 
simulation  with  QuIDDPro  [149]  or  QMDD  software  [146].  Using  reversible  miters  as 
pre -processors  can  dramatically  decrease  overall  runtime.  We  empirically  compare  the  following 
two  methods. 
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With  Local  Reduction.  Before  invoking  QuIDDPro,  reduce  the  miter  through  local 
simplification. 

W/o  Local  Reduction.  Apply  QuIDDPro  directly  to  the  miter. 

For  properly-quantum  circuit  benchmarks,  we  used  QFT  and  modular  exponentiation  modules 
from  circuits  that  implement  Shor’s  factorization  algorithm  on  an  LNN  architecture  [138].  For 
each  benchmark  circuit  with  n  inputs,  we  studied  five  cases  (new  gates  were  added  in  the  middle). 

Same.  Two  identical  copies  of  a  benchmark  circuit. 

Different  1.  A  circuit  and  its  copy  with  one  gate  added. 

Different  2.  A  circuit  and  its  copy  with  two  gates  added. 

Different  3.  A  circuit  and  its  copy  with  one  gate  deleted. 


Different  4.  A  circuit  and  its  copy  with  two  gates  deleted. 


Empirical  results  in  Table  III,  Figure  49  and  Table  VI  Figure  50  show  that  in  the  “Same”  case,  our 
local  simplifications  are  sufficient  to  conclude  equivalence.  Otherwise,  QuIDDPro  is  invoked 
after  simplification,  even  though  only  several  gates  remain.  However,  in  “Diff  2”  many  gates 
remain  after  simplification.  In  columns  “Same”  and  “Diff.  1”  we  report  runtimes  for  local 
simplification  and  the  QuIDDPro  calls.  Time-outs  are  shown  with  “>1,000”.  Local  reduction  is 
useful  even  for  properly-quantum  circuits. 

'  ’  TABLE  III 

Verification  of  QFT  circuits  with  focaf  reduction. 


n 

Same 

Difif.  1 

DifF.  2 

Diff.  3 

Diff.  4 

siiiip. 

QuIDD 

siiup. 

QuIDD 

siinp. 

QuIDD 

siliip. 

QuIDD 

siliip. 

QuIDD 

4 

0 

0 

0 

0.03 

0 

0.05 

0 

0.04 

0 

0.05 

8 

0 

0 

0.01 

0.03 

0 

0.17 

0 

0.04 

0 

0.26 

16 

0.05 

0 

0.07 

0.05 

0.08 

0.26 

0.06 

0.04 

0.07 

0.05 

32 

0.73 

0 

1.11 

0.04 

1.13 

9.17 

0.99 

0.04 

1.22 

0.08 

64 

17.29 

0 

24.32 

0.05 

25.48 

0.52 

24.33 

0.06 

30.35 

0.12 

128 

354.52 

0 

366.2 

0.04 

497.21 

>  1.000 

522.57 

0.04 

580.11 

0.39 

TABLE  IV 

Verification  of  QFT  circuits  without  focaf  reduction. 


n 

Same 

Diff.  1 

Diff.  2 

Diff.  3 

Diff.  4 

smip. 

QuIDD 

smip, 

QuIDD 

smip. 

QuIDD 

smip. 

QiiIDD 

smip. 

QiJDD 

4 

0 

0.15 

0 

0.15 

0 

0.16 

0 

0.14 

0 

0.14 

8 

0 

1.75 

0 

1.80 

0 

1.97 

0 

1.74 

0 

1.83 

16 

0.01 

>  1.000 

0.01 

>  1.000 

0 

>  1.000 

0.01 

>  1.000 

0 

>  1.000 

32 

0.05 

>  1.000 

0.04 

>  1.000 

0.04 

>  1.000 

0.04 

>  1.000 

0.04 

>  1.000 

64 

2.35 

>  1.000 

2.24 

>  1.000 

2.42 

>  1.000 

2.33 

>  1.000 

2.48 

>  1.000 

Figure  49.  Runtime  results  for  equivalence-checking  of  Quantum  Fourier  Transforms. 
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TABLE  V 

Verification  of  modufar  muftipfication  with  focaf  reduction. 


n 

Same 

Diff.  1 

Diff.  2 

Diff.  3 

Diff.  4 

smip. 

QiiIDD 

Simp. 

QiiIDD 

sunp. 

QiiIDD 

sunp. 

QuIDD 

smip. 

QuIDD 

4 

0,58 

0 

0.98 

0.04 

1.07 

0.85 

0.98 

0.05 

1.02 

0.39 

8 

2.13 

0 

3.72 

0.04 

3.69 

0.37 

3.73 

0.04 

3.45 

1.19 

16 

6.03 

0 

10.11 

0.05 

11.29 

>  1.000 

11.16 

0.05 

11.26 

5.73 

32 

16.33 

0 

27.65 

0.04 

27.49 

3.68 

27.21 

0.05 

27.83 

0.04 

64 

36.28 

0 

58.32 

0.02 

59.27 

0.56 

60.91 

0.05 

60.13 

1.33 

128 

74.77 

0 

119.71 

0.04 

120.98 

1.88 

120.83 

0.05 

121.59 

52.55 

TABLE  VI 

Verification  of  modular  multiplication  w/o  local  reduction. 


n 

Same 

Diff.  1 

Diff'.  2 

Diff.  3 

Diff',  4 

Simp. 

QuIDD 

sunp. 

QuIDD 

sunp. 

QuIDD 

sunp. 

QuIDD 

sunp. 

QuIDD 

4 

0.09 

>  1.000 

0.09 

>  1.000 

0.1 

>  1.000 

0.09 

>  1.000 

0.09 

>  1.000 

s 

0.23 

>  1.000 

0.23 

>  1.000 

0.23 

>  1.000 

0.24 

>  1.000 

0.24 

>  1.000 

Figure  50.  Runtime  results  for  equivalence-checking  of  modular  multiplication. 


For  a  more  convincing  example,  we  check  equivalence  between  an  LNN  and  non-LNN 
implementation  (without  measurement  gates)  of  Shor’s  algorithm  for  factoring  the  number  15. 
These  equivalent  properly-quantum  circuits  include  2,732  gates  for  the  non-LNN  version  and 
5,120  gates  for  the  LNN  version.  Their  structure  is  very  different.  For  equivalence-checking,  we 
used  QuIDDPro  with  and  without  local  reduction,  and  these  runs  completed  in  59.07s  and 
64095.22s,  respectively.  The  results  unequivocally  suggest  the  effectiveness  of  local  reductions 
with  reversible  miters  of  properly-quantum  circuits. 

2)  Proposed  Method:  Boosting  Verification  by  Using  SAT-based  Combinational  Tools:  Local 
reduction  may  leave  many  gates  around,  after  which  QuIDDPro  tends  to  consume  significant  time 
and  memory.  However,  if  very  few  properly-quantum  gates  remain,  a  more  lightweight 
verification  procedure  may  be  used.  Generic  symbolic  simulators,  such  as  QuIDDPro,  do  not  scale 
as  well  as  state-of-the-art  SAT-based  combinational  equivalence-checking.  Hence  we  leverage 
SAT-based  tools  to  boost  equivalence-checking  of  quantum  circuits. 

FOR  TWO  CIRCUITS  Ci  AND  C2,  WE  DO  THE  FOLLOWING. 

Step  1,  Construct  the  miter  circuit  C  =  Ci  •  Ca”'. 

Step  2,  Perform  simplification  of  the  miter  circuit. 

Step  3.  If  properly-quantum  gates  remain,  go  to  Step  4,  else  invoke  state-of-the-art  SAT-based 
combinational  equivalence-checking  (the  “cec”  command  of  ABC  system  [136])  to  tell  if  the  miter 
circuit  is  Identity. 

Step  4,  Find  the  longest  sequence  of  conventional  logic  gates  (NOT,  CNOT,  Toffoli)  in  the  miter 
circuit.  Label  this  sequence  Ca.  Let  the  simplified  miter  circuit  be  Qa  '  Ca  '  Qt- 
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Step  5,  Transform  Qa  '  Ca  '  Qb  to  Ca  '  Qb  '  Qa-  Note  that  Qa  ■  Ca  '  Qb  =  I  (Identity)  iff  Ca  *  Qh 
•  Qa=  I  a.s  shown  earlier.  Move  eonventional  gates  in  Qb  *  Qa  to  the  front  of  the  miter  as  mueh  as 
possible,  ereating  a  transformed  miter  C'a  *  Q'b,  where  Ca  and  Q  are  a  reversible  eireuit  and  a 
properly-quantum  eireuit,  respectively. 

Step  6,  Check  the  functionality  of  Q'b  by  lightweight  iterated  simulation.  If  it  is  not  properly 
quantum,  conclude  that  the  miter  circuit  is  not  Identity.  Else,  go  to  Step  7. 

Step  7.  Exploit  the  functionality  of  Q'b,  and  let  Cb  be  a  conventional  circuit  which  corresponds  to 
the  exploited  logic  functionality.  Then,  check  whether  C'a  '  Cb  is  Identity  or  not. 

Consider  the  properly-quantum  circuit  as  shown  in  the  left-hand  side  of  this  figure. 


Figure  51.  An  example  of  circuit  restructuring  after  simplification. 

Here  we  assume  that  Cj  is  relatively  large.  Then  after  Step  5,  we  can  get  the  right-hand  side  circuit 
from  the  left-hand  side  circuit  in  Figure  5 1 .  Our  miter  becomes  Ci  •  Q2  where  C/  is  reversible  but 
Q2  is  properly-quantum.  This  avoids  a  heavy-duty  generic  quantum  simulator  for  C;.  A  key 
observation  is  that  the  functionality  of  Q'b  (at  Step  6)  should  be  classical  {C'a  inverse)  if  the  entire 
miter  is  Identity.  Thus,  if  Q'b  is  properly-quantum,  the  miter  circuit  is  not  Identity.  When  Q'b  has 
few  gates,  this  can  be  checked  efficiently  by  a  quantum  generic  simulator.  By  Step  7, 
properly-quantum  gates  are  reduced,  and  we  can  use  state-of-the-art  SAT -based  combinational 
equivalence-checking.  By  avoiding  heavy-duty  generic  quantum  simulation,  our  adaptive  method 
can  achieve  significant  speed-ups  when  Ca'  is  large.  To  validate  our  method,  we  studied  circuits 
implementing  one  iteration  of  Grover’s  quantum  algorithm  for  search  [147]  as  shown  in  Figure  5 1 . 
A  particular  step  of  the  algorithm,  called  the  oracle,  is  implemented  with  a  reversible  circuit 
module  C/ based  on  a  user-defined  Boolean  function  f  (search  predicate).  To  make  verification 
more  challenging,  we  configured  a  search  predicate  that  contains  a  multiplier  circuit.  We  then 
created  an  equivalent  variant  of  Cf  by  applying  a  global,  rather  than  local,  circuit  transform. 
Namely,  we  applied  a  certain  wire  permutation  on  inputs  of  C/^and  its  inverse  on  outputs  of  Cq. 
This  permutation  was  implemented  by  applying  SWAP  gates  to  (all)  pairs  of  adjacent  wires  and 
then  breaking  down  each  SWAP  gate  into  three  CNOT  gates.  In  our  case  study,  the  proposed 
procedure  goes  as  follows. 

Step  1.  Construct  the  miter  circuit  C  =  C;  ■C2~‘  =  C/  ■  ■  Ci^  ’  ■  (W^)~^  ■  (Co)~‘  ' 
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Step  2,  Simplify  the  miter  eireuit.  Beeause  of  the  inserted  SWAP  gates  (if  we  use  only  naive 
eaneellation  rules),  we  eannot  eaneel  the  two  pairs  of  C/  and  (C/ )~\  or  Co  and  (Co^ )~\  But  we 
ean  remove  the  sequenee  redueing  the  miter  to  C/  ’  -Co  •  (Co  )~^  ■  (W^)~^ 

’(c/r. 


Step  3.  Since  properly-quantum  gates  remain,  go  to  Step  4. 

Steps  4  and  5.  Move  (C/  )~^  to  the  input  side  of  the  circuit  to  maximize  the  conventional  logic 
part  in  the  prefix.  The  miter  becomes  Ca'  *  Qb'  where  C'a  =  (Cf  ^  )~^  *  Cf^  and  (Qb)'  =  W'  *  Co^  * 
(Co')-'  •  (W')-'. 

Steps  6.  and  7.  Using  techniques  described  earlier,  combine  a  quantum  generic  simulator 
(QuIDDPro  [149,  150])  and  state-of-the-art  SAT -based  combinational  equivalence-checking  (the 
“cec”  command  of  ABC  system  [136]). 

The  above  technique  is  compared  to  constructing  a  miter  circuit  and  applying  the  symbolic 
simulator  QuIDDPro  [149,  150]  to  the  miter.  QuIDDPro  alone  does  not  finish  in  ten  hours,  but  our 
technique  completes  in  under  seven  seconds. 


5.4.  Summary 

We  have  studied  several  techniques  for  equivalence-checking  of  reversible  circuits,  including  the 
new  concept  of  reversible  miters.  In  particular,  we  have  observed  that  state-of-the-art  SAT-based 
combinational  equivalence-checking  (cec)  can  be  adapted  to  this  context  and  outperforms  generic 
quantum  techniques.  Basic  BDD-based  techniques  usually  outperform  SAT-based  techniques,  but 
not  cec.  As  is  the  case  with  Automatic  Test-pattern  Generation,  reversibility  can  significantly 
simplify  equivalence-checking,  while  these  simplifications  are  compatible  with  other  techniques 
and  amplify  them.  We  then  proposed  an  adaptive  method  to  verily  quantum  circuits  more 
efficiently  than  the  existing  quantum  circuit  verification  tools  by  combining  them  with  the 
state-of-the-art  SAT-based  combinational  equivalence-checking  tool  for  the  conventional  circuits. 
Experiments  suggest  that  reversible  miters  are  useful  for  the  verification  of  reversible  circuits  as 
well  as  properly-quantum  circuits. 
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6.  Ising  Systems  and  Quantum  Adiabatic  Computation 


With  the  prospect  of  atomic-scale  computing,  we  are  studying  cumulative  energy  profiles  of 
spin-spin  interactions  in  non-ferromagnetic  lattices  (spin-glasses) — an  established  topic  in 
statistical  and  solid-state  physics.  Recent  proposals  suggest  using  Ising  spin-glasses  for 
non-traditional  computing  as  a  way  to  harness  nature’s  ability  to  find  min-energy  states,  and  to 
take  advantage  of  quantum  tunneling  to  boost  combinatorial  optimization.  We  therefore 
developed  EDA-inspired  high-performance  algorithms  to  simulate  natural  energy  minimization  in 
Ising  systems.  Unlike  previous  work,  our  algorithms  are  not  limited  to  planar  Ising  topologies.  In 
one  CPU-day,  our  branch-and-bound  algorithm  finds  min-energy  (ground)  states  on  100  spins, 
while  our  local  search  approximates  ground  states  on  1,000,000  spins.  We  use  this  computational 
tool  to  study  the  significance  of  hyper-couplings  in  recently  implemented  adiabatic  quantum 
computers. 

6.1.  Introduction 

As  leading  CMOS  foundries  are  gearing  for  mass  production  of  22nm  and  32nm  CMOS  chips, 
long-term  EDA  research  has  started  to  explore  the  use  of  atomic  properties  in  computing.  This 
exploration  relies  on  established  computational  models  of  atomic-scale  phenomena,  but  struggles 
to  connect  different  levels  of  abstraction — spin-level  models  and  energy-based  statistical  macro 
models.  The  spin-glass  model  was  proposed  by  Edwards  and  Anderson  [160]  as  a  variation  of  the 
Ising  model  to  study  disorder  and  frustration  in  crystallized  solids.  Such  systems  are  composed  of 
particles  that  can  be  in  either  of  two  possible  energy  spin  states.  These  spins  interact  in  pairs  to 
produce  an  energy  landscape  that  describes  the  overall  behavior  of  the  system.  The  model  is 
described  in  graph-theoretic  terms  by  representing  atoms  in  a  crystal  with  vertices  and  bonds 
between  atoms  with  edges.  Since  physical  systems  found  in  nature  are  often  disordered,  the  bond 
edges  are  assigned  random  weight  values.  Eurthermore,  some  physical  and  chemical  properties  of 
a  crystal  depend  on  the  total  energy  of  the  bonds,  which  depend  on  atomic  states.  Thus,  estimating 
total  energy  via  a  graph-based  function  facilitates  the  use  of  graph  algorithms  to  study  properties  of 
solids.  In  particular,  we  are  interested  in  finding  the  spin  configuration  that  produces  the  least 
amount  of  energy.  This  configuration  is  known  as  the  ground  state  of  the  system.  Barahona  [155] 
proved  that  for  general  Ising  spin  systems  (spin-glasses),  the  ground-state  determination  problem 
(GSD)  is  NP-hard. 

Since  many  physical  systems  have  a  natural  ability  to  find  least-energy  states  quickly,  researchers 
are  currently  attempting  to  exploit  this  phenomenon  to  perform  useful  computation.  At  the  atomic 
scale,  energy  minimization  can  be  aided  by  quantum  tunneling,  which  effectively  reduces  the 
number  of  local  minima.  Thus,  GSD  problems  are  of  particular  interest  to  quantum-information 
researchers  because  they  are  suitable  candidates  for  evaluating  the  performance  of  adiabatic 
quantum  computers  (AQCs). 

Recently  developed  AQCs  employ  an  architecture  based  on  Ising  spin  systems  [167].  Eirst,  the 
spin  system  is  configured  to  represent  a  given  combinatorial  problem,  i.e.,  the  spin  interactions  are 
carefully  controlled  rather  than  random  as  in  spin  glasses.  The  ground  state  is  approximated  via 
quantum  annealing  (the  quantum  analogue  of  thermal  annealing),  then  read  off  as  a  bit  sequence 
and  interpreted  as  an  answer  to  the  problem.  Since  the  complexity  of  GSD  is  universal  with 
respect  to  both  quantum  and  conventional  forms  of  computation  [155,  164],  it  is  important  to 
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consider  how  well  an  approximation  to  the  ground-state  energy  can  be  obtained  by  purely  elassieal 
teehniques.  Consequently,  Bansal  et  al.  [154]  proposed  an  approximation  algorithm  for  GSD  on 
Ising  spin  lattiees,  whieh  essentially  simulates  these  AQC  arehiteetures  [162],  and  thus  limits  their 
potential  for  quantum  speed-ups.  To  approximate  the  least  energy  with  8  aeeuraey,  the  algorithm 
from  [154]  requires  runtime  exponential  in  l/s,  whieh  is  impraetieal.  In  eontrast,  we  propose  a 
braneh-and-bound  algorithm  and  a  high-performanee  loeal  seareh  that  quickly  finds  near-optimal 
energy  values  for  arbitrary  Ising  topologies.  Sueh  teehniques  ean  be  used  to  oritieally  assess  the 
performanee  of  non-traditional  eomputing  deviees  based  on  energy  minimization  in  Ising 
spin-glasses  and  also  to  determine  the  best  implementation  options. 


6.2.  Previous  Work 

In  the  Edwards- Anderson  [160]  model,  spins  are  binary  ±1  values,  and  the  strengths  of  the  atomie 
eouplings  are  independent  and  identieally  distributed  random  variables  aeeording  to  some 
probability  distribution.  The  most  common  distributions  used  are  the  Gaussian,  and  the 
±l-bimodal  distributions.  Let  Gising  =  (V,E)  denote  an  Ising-model  graph  with  n  vertices  (spins). 
Eaeh  vertex  u  e  V  is  denoted  by  spin  value  Su  e  {±1}  and  is  assigned  a  magnetization  weight  hu. 
Eor  u,v  eV,  we  define  (u,v)  e  E  to  be  an  edge  representing  a  bond  between  two  adjaeent  spins  with 
assigned  weight  Ju,v  chosen  randomly  from  the  standard  Gaussian  distribution  (ju  =  0,  S  =  1). 
Thus,  we  expeet  that  half  of  the  bonds  in  the  graph  will  be  negative.  Usually,  the  same 
positive-to-negative  bond  ratio  is  maintained  when  the  ±l-bimodal  distribution  is  used  instead. 
The  internal  energy  of  the  system  for  a  particular  configuration  of  spin  values  a  =  (SJ  is  given  by 
E(a)  =  -EijJijSiSj-Ei  hiSi  where  the  summation  eonsiders  all  pairs  of  adjaeent  spins. 

Putting  together  the  energies  of  all  spin  eonfigurations  gives  the  Hamiltonian  of  the  system.  Thus, 
the  ground  state  is  given  by  Egs  =  min(E(a)  \  a  e  where  7r„  is  the  set  of  all  possible  n-spin 
eonfigurations.  Whether  we  are  interested  in  the  lowest-energy  value  or  the  n-spin  eonfiguration 
with  sueh  energy,  |7r„|  =  2”  beeause  eaeh  of  the  spins  ean  take  on  one  of  two  possible  values. 
Energy  minimization  is  typieally  NP-hard.  Thus,  ealeulating  the  ground  state  exaetly  using  an 
exhaustive  search  algorithm  is  feasible  only  for  small  Ising  models.  To  provide  a  sealable  way  of 
finding  ground  states  or  approximating  their  energies,  heuristies  are  required,  sueh  as  those 
developed  in  our  work. 

Ising-model  graphs  are  not  limited  to  a  partieular  topology,  but  two-  and  three-dimensional  lattiees 
are  most  commonly  considered  in  the  literature.  To  simulate  the  behavior  of  infinite  spin  glasses, 
it  is  common  to  require  that  the  spins  lying  on  the  dimensional  boundary  be  eonneeted  to  the  spins 
on  the  opposing  boundary  on  the  same  dimension.  This  ean  be  viewed  as  a  type  of  (periodie) 
boundary  eondition.  In  partieular,  only  one  periodie  boundary  eondition  is  imposed  for  eaeh 
dimension  of  the  lattiee.  However,  it  is  sometimes  desirable  to  have  boundary  eonditions  on  some 
but  not  all  of  the  lattiee  dimensions. 
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Although  most  variations  of  GSD  are  known  to  be  NP-hard  [155],  there  are  a  few  eases  where  the 
strueture  of  the  graph  ean  be  exploited  to  solve  the  problem  in  polynomial  time.  For  example, 
Bieehe  et  al.  [158]  proved  that  the  GSD  problem  on  planar  graphs  with  zero  magnetization  ean  be 
solved  in  polynomial  time  by  showing  a  reduetion  to  the  minimum-weight  perfeet  matehing 
(MWPM)  problem.  It  follows  from  their  work  that  GSD  instanees  with  zero  magnetization  {hi  = 
0)  and  0-  or  l-periodie  boundary  eondhions  ean  be  solved  in  0(n)  time.  However,  although 
MWPM  is  solvable  in  polynomial  time,  the  runtime  is  impraetieal  for  large  instanees  and  suffers 
from  a  big  memory  footprint.  To  overeome  these  limitations,  the  work  in  [165]  deseribes  a 
heuristie  based  on  the  MWPM  reduetion.  Another  speeial  ease  eonsidered  in  the  literature  is  that 
of  ferromagnetie  {Jij  >0)  GSD  instanees.  Barahona  [156]  redueed  this  partieular  problem  to 
(s-t)-mm-out  or  max-flow. 

6.3.  Finding  Exact  Ground  States 

In  order  to  better  eontrol  trade-offs  between  runtime  and  solution  quality  obtained  from  heuristies, 
it  is  important  to  design  algorithms  that  are  guaranteed  to  find  exaet  ground  states.  Although  sueh 
algorithms  are  unlikely  to  seale  to  large  instanees,  the  exaet  solution  obtained  from  smaller 
instanees  ean  be  used  to  debug  and  evaluate  the  performanee  of  more  sealable  heuristies. 
Braneh-and-bound  (B&B)  algorithms  eonsider  ineomplete  or  partial  solutions,  where  only  some 
variables  are  assigned  values.  Partial  solutions  are  systematieally  eonstrueted  via  a  branehing 
proeess  that  develops  partial  solutions  that  are  deemed  promising,  i.e.,  those  that  may  lead  to  the 
optimal  solution.  Partial  solutions,  whose  eost  is  too  high,  are  “bounded  away”  or  pruned. 

Our  branehing  proeess  proeeeds  as  follows.  First,  all  spins  are  labeled  as  unassigned-their  value 
ean  be  set  in  the  future  to  either  1  or  -1.  The  algorithm  then  ealeulates  a  lower  bound  Eib  for 
minimum  energy.  It  then  seleets  a  spin  I  and  branehes  on  one  of  the  possible  values.  In  eaeh 
braneh,  the  ineremental  ehange  in  Eib  eaused  by  the  assignment  is  reeorded  as  follows.  For  eaeh 
spin]  adjaeent  to  i  that  has  already  been  assigned,  inerease  (deerease)  Ew  by  twiee  the  amount  of 
the  positive  (negative)  bond  eonneeting  i  and  j  if  they  have  opposing  (aligned)  spin  values. 
Similarly,  the  eorresponding  ehange  due  to  the  magnetization  of  the  spin  is  also  reeorded 
(formulas  for  these  updates  are  available  in  our  teehnieal  papers  and  souree  eode). 

Onee  the  spin  is  assigned,  the  algorithm  branehes  to  another  spin  and  repeats  the  same  proeedure. 
When  all  spins  have  been  assigned,  Eib  represents  the  energy  of  the  spin  eonfiguration  generated 
by  the  branehing  proeess.  To  eontinue  searehing  the  eonfiguration  spaee,  the  branehing  proeess 
baektraeks  to  the  last  assigned  spin,  flips  its  value  and  updates  Eib.  If  both  spin  values  have  already 
been  tried,  then  the  algorithm  eontinues  baektraeking  while  relabeling  the  spins  as  unassigned. 
Sinee  eaeh  spin  ean  take  one  of  two  values,  this  branehing  proeess  generates  a  full  binary  seareh 
tree  where  the  leaves  eorrespond  to  all  possible  spin  eonfigurations  in  the  Ising  system.  Initially, 
we  use  a  linear-time  greedy  approximation  E  as  our  bounding  value.  During  the  branehing 
proeess,  if  the  energy  of  the  partial  solution  exeeeds  E,  then  we  ean  safely  prune  this  braneh  and 
baektraek  without  making  any  further  assignments.  The  algorithm  either  tries  the  opposite  spin 
value  or  baektraeks  again  if  both  spin  values  have  already  been  tried.  If  the  seareh  assigns  all  the 
spins  in  the  graph  and  the  eorresponding  minimal  energy  state  is  lower  than  E,  then  we  set  E  to  this 
new  energy  value.  After  searehing  all  promising  branehes,  E  will  assume  the  ground-state  energy. 
In  our  experiments,  this  standard  bounding  teehnique  alone  improved  the  sealability  of  the 
branehing  proeess  by  an  order  of  magnitude  over  exhaustive  seareh. 
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To  further  improve  the  scalability  of  our  B&B  algorithm,  we  designed  a  prune-by-dominance 
technique  that  consists  of  identifying  partial  solutions  whose  partial  energy  can  be  improved 
(lowered)  by  modifying  the  configuration  of  currently  assigned  spins. 


■# 


Figure  52.  Pruning  by  dominance  in  our  branch  and  bound  algorithm. 


Note  that,  whenever  we  assign  a  spin  s,  there  is  a  set  Fs  of  spins  adjacent  to  5  for  which  all 
neighboring  spins  have  also  been  assigned.  Figure  52  shows  an  example  of  s  and  Fs  on  a  small 
grid.  Note  that  \Fs\  <  degree(5).  The  energy  of  the  spins  in  Fs  is  localized  in  the  sense  that  it  will 
not  be  affected  by  further  spin  assignments  (since  all  the  neighbors  of  the  spins  in  Fs  are  assigned). 
This  allows  us  to  evaluate  a  partial  solution  by  flipping  the  values  of  the  spins  in  Fs  and  comparing 
the  partial  energies.  If  any  of  the  partial  energies  are  lower,  then  we  know  that  the  current  partial 
solution  is  unpromising  and  we  can  backtrack.  Observe  that  in  cases  when  different  branches  are 
statistically  unlikely  to  have  equal  partial  cost  (e.g.,  when  couplings  and  magnetizations  are 
random),  for  two  branches,  the  probability  that  the  first  branch  dominates  the  second  branch  is 
approximately  1/2.  Let  0  <  c  <  i  be  the  fraction  of  2^  partial  solutions  that  require  branching.  Then 
we  can  expect  to  prune  c2V2  =  02”^”'  of  these  branches.  This  pruning  technique  improved  the 
scalability  of  our  B&B  algorithm  by  1-2  orders  of  magnitude  on  a  variety  of  Ising  benchmarks. 

6.4.  GSD  through  Local  Search 

Due  to  the  difficulty  of  solving  GSD,  researchers  have  developed  heuristic  methods  typically 
based  on  slow  Monte  Carlo  simulations.  However,  because  of  the  role  that  Ising  models  play  in 
simulating  real-world  phenomena,  it  is  desirable  to  have  much  faster  techniques. 
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Our  local  search  is  an  iterative  improvement  algorithm  that  modifies  the  bipartition  indueed  by  an 
arbitrary  spin  eonfiguration-positive  spins  are  placed  in  one  partition  and  negative  spins  in  the 
other.  Figure  53  eompares  the  runtime  of  our  approaeh  with  other  optimal  Ising  solvers.  The 
algorithm  performs  a  sequence  of  ineremental  changes  to  the  bipartition,  organized  as  passes. 
These  ehanges  consist  of  spin-moves  that  remove  a  partieular  spin  from  its  current  partition  and 
plaees  it  in  the  opposite  one.  At  the  beginning  of  each  pass,  the  energy  differential  (gain)  of 
performing  eaeh  possible  move  is  caleulated.  A  positive  gain  implies  that  the  move  decreases  the 
overall  energy  while  a  negative  gain  inereases  it.  During  a  pass,  the  move  that  produces  the  largest 
gain  is  seleeted  and  exeeuted.  The  corresponding  spin  is  then  labeled  as  loeked,  i.e.,  it  eannot  be 
seleeted  again  in  the  eurrent  pass.  The  pass  continues  selecting  and  executing  the  best  moves  until 
all  spins  have  been  loeked.  At  the  end  of  the  pass,  we  save  the  best-seen  bipartition  produeed  by 
the  sequence  of  moves.  This  bipartition  is  then  used  as  the  starting  solution  of  the  next  pass.  The 
entire  algorithm  terminates  when  a  pass  fails  to  obtain  an  improvement.  Note  that,  in  the  absenee 
of  positive-gain  moves,  a  negative-gain  move  ean  be  seleeted.  Thus,  a  pass  may  aceept  a  solution 
that  is  worse  than  the  existing  solution  (hill-elimbing).  This  helps  to  reduee  the  probability  of 
getting  trapped  in  local  minima.  The  initial  random  solution  used  in  our  loeal  seareh  is  generated 
in  linear  time. 

To  inerease  the  quality  of  solutions  and  the  probability  of  finding  the  exact  ground  state,  we  repeat 
the  algorithm  using  multiple  random  initial  solutions  and  seleeting  the  best  result.  The  relationship 
between  the  number  of  random  starts  and  solution  quality  is  explored  in  our  experiments.  Each 
move  causes  a  change  in  the  loeal  energy  surrounding  the  seleeted  spin,  therefore,  the  gains  of  the 
neighboring  spins  need  to  be  updated  after  eaeh  move.  These  gain  updates  are  performed 
effieiently  using  a  custom  heap-based  data  structure.  The  data  strueture  consists  of  two  arrays. 
The  first  array  implements  a  traditional  binary  heap  while  the  second  array  allows  quick  access  to 
the  heap-array  element  that  contains  the  gain-update  value  of  a  partieular  spin.  To  perform  gain 
updates,  we  can  access  the  specific  value  'mO(l)  time,  update  the  value,  and  perform  the  neeessary 
swaps  to  maintain  heap-order.  Since  only  \og(n)  swaps  are  required  in  the  worst  ease  (where  n  is 
the  number  of  spins),  our  data  structure  allows  us  to  perform  gain  updates  much  faster  than  naive 
implementations  that  require  scanning  the  entire  set  of  n  gain  values.  Since  n  moves  are  performed 
during  a  pass,  and  only  a  constant  number  of  passes  are  performed,  the  runtime  of  our  local  search 
is  0(n  log(n)). 

6.5.  Empirical  Validation 

We  have  tested  single -threaded  implementations  of  the  algorithms  proposed  on  a  conventional 
Linux  server.  For  small  to  medium-sized  instances  our  local  search  finds  exact  ground  states  in 
95%  of  independent  random  starts,  otherwise  solutions  are  5%  sub-optimal  on  average.  We 
compared  the  average  solution  quality  of  local  search  for  2-dimensional  Ising  spin  glasses  with 
Gaussian-distributed  couplings  and  two  periodic  boundary  conditions.  The  benchmarks  are 
differentiated  by  the  number  of  spins  included.  For  each  instance  we  considered  five  different 
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levels  of  effort  with  an  increasing  number  of  independent  random  starts  (1,  ln(n),  10,  In  (n)  and  n) 
per  instance  per  level  of  effort.  To  obtain  the  average  solution  quality  we  computed  1000  output 
samples  per  instance  per  ease  eonsidered.  The  exact  ground  states  were  obtained  via  the 
University  of  Cologne’s  Ising  Spin  Glass  Server  [168]  which  uses  a  branch-and-eut  algorithm  to 
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find  exact  ground  states  on  grid  spin  glasses  with  zero  magnetization.  As  expected,  the  solution 
quality  improves  as  the  number  of  random  starts  increases.  When  we  consider  at  least  In  (n) 
random  starts,  our  local  search  produces  high-quality  solutions  (>  95%)  for  five  of  the 
benchmarks,  while  its  runtime  does  not  exceed  17  seconds  for  the  largest  benchmark  with  2500 
spins. 

Note  that  the  quality  of  the  solutions  decreases  as  we  consider  larger  instances.  This  is  expected 
since  the  energy  landscape  gets  more  complicated  as  the  number  of  spins  is  increased  and  the 
probability  of  finding  the  exact  ground  state  becomes  exponentially  small.  However,  this  is  true  of 
all  heuristics.  We  also  experimented  with  spin-glasses  that  have  ±l-bimodal  coupling 
distributions.  Compared  to  Gaussian-distributed  instances,  the  expected  solution  quality  observed 
is  better.  Our  heuristic  finds  high-quality  solutions  for  all  but  one  of  the  benchmarks  using  a  single 
random  start. 

Our  heuristic  scales  to  a  million  spins  and  empirical  runtimes  closely  fit  nlog(n)).  We  compared 
the  runtime  of  our  local  search  against  that  of  the  MWPM-based  heuristic  proposed  in  [165].  The 
solution  quality  of  this  heuristic  depends  on  a  particular  choice  of  parameters  and  does  not  work  on 
Gaussian-distributed  instances.  In  contrast,  our  heuristic  does  not  have  such  a  dependency  and 
works  on  all  instances.  Table  IV  in  [165]  shows  the  runtime  and  solution  quality  of  the 
MWPM-based  heuristic  ±1  grid  graphs  of  size  164x164.  This  heuristic  takes  59^  on  average  (with 
negligible  deviation),  producing  the  optimal  value  only  61%  of  the  time.  By  comparison,  our  local 
search  heuristic  on  a  comparable  benchmark  with  a  single  random  start  takes  about  8.5^.  Thus,  we 
can  perform  7  random  starts  in  the  same  period  of  time. 


81 


7. 


Conclusion 


This  project  investigated  novel  hybrid  techniques,  combining  concepts  from  quantum  information 
science  and  classical  computer  science  to  solve  hard  computational  problems,  including  the 
handling  of  uncertainty,  such  as  design  optimization  and  simulation  of  conventional  CMOS  and 
quantum  systems,  fault  tolerance,  resource  allocation  and  scheduling,  strategy  optimization  and 
related  computationally  challenging  problems  facing  the  Air  Force.  The  project  accomplished: 

•  An  analytical  study  of  probabilistic  fault  models  in  digital  logic  and  their  impact  on  overall 
circuit  performance.  Probabilistic  faults  affect  electronics  in  high- altitude  and 
high-radiation  environments,  especially  state-of-the-art  deep-submicron  CMOS  chips. 

•  New  algorithmic  methodologies  and  tools  for  circuit  synthesis  and  test  to  mitigate  the 
impact  of  probabilistic  faults.  These  methodologies  include  fast  evaluation  of  circuit 
reliability  based  on  functional  simulation,  and  incremental  modification  of  a  circuit  to 
improve  its  robustness. 

•  Analysis  of  mobile  (wireless)  ad  hoc  communication  networks,  focused  on  network 
throughput  and  total  power  consumption. 

•  A  non-linear  programming  framework  for  spatial  optimization  of  mobile  networks,  its 
empirical  evaluation,  and  visualization  of  results. 

•  A  new  algorithm  to  simulate  quantum  circuits  which  exhibits  polynomial-time 
performance  in  several  important  cases.  This  algorithm  and  accompanying  theoretical 
results  have  been  subsequently  used  by  other  researchers  to  show  that  the  Quantum  Fourier 
Transform  can  be  simulated  in  polynomial  time  on  conventional  computers. 

•  Several  techniques  for  verification  of  correctness  of  quantum  circuits.  These  techniques 
are  based  on  computational  engines  frequently  used  in  Electronic  Design  Automation 
Boolean  satisfiability  (SAT)  and  binary  decision  diagram  (BDD),  fall  under  the  category  of 
equivalence-checking,  and  can  verify  the  results  of  adapting  known  circuits  to  specific 
device  architectures,  such  as  linear  ion  traps. 
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